Giter VIP home page Giter VIP logo

ansible-node-exporter's Introduction

DEPRECATED

This role has been deprecated in favor of a the prometheus-community/ansible collection.

Ansible Role: node exporter

License Ansible Role GitHub tag

Warning

Due to limitations of galaxy.ansible.com we had to move the role to https://galaxy.ansible.com/cloudalchemy/node_exporter and use _ instead of - in role name. This is a breaking change and unfortunately, it affects all versions of node_exporter role as ansible galaxy doesn't offer any form of redirection. We are sorry for the inconvenience.

Description

Deploy prometheus node exporter using ansible.

Requirements

  • Ansible >= 2.7 (It might work on previous versions, but we cannot guarantee it)
  • gnu-tar on Mac deployer host (brew install gnu-tar)
  • Passlib is required when using the basic authentication feature (pip install passlib[bcrypt])

Role Variables

All variables which can be overridden are stored in defaults/main.yml and are listed in the table below.

Name Default Value Description
node_exporter_version 1.1.2 Node exporter package version. Also accepts latest as parameter.
node_exporter_binary_local_dir "" Enables the use of local packages instead of those distributed on github. The parameter may be set to a directory where the node_exporter binary is stored on the host where ansible is run. This overrides the node_exporter_version parameter
node_exporter_web_listen_address "0.0.0.0:9100" Address on which node exporter will listen
node_exporter_web_telemetry_path "/metrics" Path under which to expose metrics
node_exporter_enabled_collectors ["systemd",{textfile: {directory: "{{node_exporter_textfile_dir}}"}}] List of dicts defining additionally enabled collectors and their configuration. It adds collectors to those enabled by default.
node_exporter_disabled_collectors [] List of disabled collectors. By default node_exporter disables collectors listed here.
node_exporter_textfile_dir "/var/lib/node_exporter" Directory used by the Textfile Collector. To get permissions to write metrics in this directory, users must be in node-exp system group. Note: More information in TROUBLESHOOTING.md guide.
node_exporter_tls_server_config {} Configuration for TLS authentication. Keys and values are the same as in node_exporter docs.
node_exporter_http_server_config {} Config for HTTP/2 support. Keys and values are the same as in node_exporter docs.
node_exporter_basic_auth_users {} Dictionary of users and password for basic authentication. Passwords are automatically hashed with bcrypt.

Example

Playbook

Use it in a playbook as follows:

- hosts: all
  roles:
    - cloudalchemy.node_exporter

TLS config

Before running node_exporter role, the user needs to provision their own certificate and key.

- hosts: all
  pre_tasks:
    - name: Create node_exporter cert dir
      file:
        path: "/etc/node_exporter"
        state: directory
        owner: root
        group: root

    - name: Create cert and key
      openssl_certificate:
        path: /etc/node_exporter/tls.cert
        csr_path: /etc/node_exporter/tls.csr
        privatekey_path: /etc/node_exporter/tls.key
        provider: selfsigned
  roles:
    - cloudalchemy.node_exporter
  vars:
    node_exporter_tls_server_config:
      cert_file: /etc/node_exporter/tls.cert
      key_file: /etc/node_exporter/tls.key
    node_exporter_basic_auth_users:
      randomuser: examplepassword 

Demo site

We provide an example site that demonstrates a full monitoring solution based on prometheus and grafana. The repository with code and links to running instances is available on github and the site is hosted on DigitalOcean.

Local Testing

The preferred way of locally testing the role is to use Docker and molecule (v3.x). You will have to install Docker on your system. See "Get started" for a Docker package suitable for your system. Running your tests is as simple as executing molecule test.

Continuous Integration

Combining molecule and circle CI allows us to test how new PRs will behave when used with multiple ansible versions and multiple operating systems. This also allows use to create test scenarios for different role configurations. As a result we have quite a large test matrix which can take more time than local testing, so please be patient.

Contributing

See contributor guideline.

Troubleshooting

See troubleshooting.

License

This project is licensed under MIT License. See LICENSE for more details.

ansible-node-exporter's People

Contributors

agcandil-atsistemas avatar angristan avatar anisse avatar appliedprivacy avatar bittopaz avatar bswinnerton avatar cloudalchemybot avatar daazku avatar denmat avatar ecksun avatar etcet avatar fprojetto avatar friesenkiwi avatar jkrol2 avatar kmille avatar ko-zu avatar laurvas avatar noraab avatar oguzhaninan avatar parmsib avatar paulfantom avatar porkepix avatar rdemachkovych avatar ruzickap avatar rwos avatar sarphram avatar sdarwin avatar superq avatar till avatar wikro avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

ansible-node-exporter's Issues

Basic Authentication password hashing

What did you do?
I'm currently trying to set up a node_exporter on a test server using both TLS and basic authentication.
I have setup a simple user in my playbook as mentioned in the README.

node_exporter_basic_auth_users:
   user: password

But this results in the following config file on the remote machine:

basic_auth_users:
  user: *0

Changing the password to something else does not affect the *0 field.

The following error is also raised when the node_exporter is starting up:
level=error ts=2020-10-14T13:31:20.226Z caller=node_exporter.go:194 err="yaml: unknown anchor '0' referenced"

The exporter works well if no users are specified.
Any idea where I could have made a mistake?

Did you expect to see some different?
No crash on start up and a correctly hashed password in the config file.

Environment

  • Role version:
    0.22.0

  • Ansible version information:

ansible 2.9.12
config file = None
configured module search path = ['/home/tdh/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
ansible python module location = /home/tdh/.local/lib/python3.8/site-packages/ansible
executable location = /home/tdh/.local/bin/ansible
python version = 3.8.2 (default, Apr 27 2020, 15:53:34) [GCC 9.3.0]
  • Variables:
node_exporter_basic_auth_users: 
  user: password

Failing tests

What happened?

TASK [ansible-node-exporter : Create node_exporter config directory] ***********
    fatal: [centos8]: UNREACHABLE! => {"changed": false, "msg": "Failed to create temporary directory.In some cases, you may have been able to authenticate and did not have permissions on the target directory. Consider changing the remote tmp path in ansible.cfg to a path rooted in \"/tmp\", for more error information use -vvv. Failed command was: ( umask 77 && mkdir -p \"` echo OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused \"process_linux.go:90: adding pid 28027 to cgroups caused \\\"failed to write 28027 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/6bffe942ab289f6018c910fcb7b67957333e050d435ffbe3bbfedb72effead8d/cgroup.procs: invalid argument\\\"\": unknown/.ansible/tmp `\"&& mkdir OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused \"process_linux.go:90: adding pid 28027 to cgroups caused \\\"failed to write 28027 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/6bffe942ab289f6018c910fcb7b67957333e050d435ffbe3bbfedb72effead8d/cgroup.procs: invalid argument\\\"\": unknown/.ansible/tmp/ansible-tmp-1595086095.7307746-27669-8276321652003 && echo ansible-tmp-1595086095.7307746-27669-8276321652003=\"` echo OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused \"process_linux.go:90: adding pid 28027 to cgroups caused \\\"failed to write 28027 to cgroup.procs: write /sys/fs/cgroup/cpu,cpuacct/docker/6bffe942ab289f6018c910fcb7b67957333e050d435ffbe3bbfedb72effead8d/cgroup.procs: invalid argument\\\"\": unknown/.ansible/tmp/ansible-tmp-1595086095.7307746-27669-8276321652003 `\" ), exited with result 1", "unreachable": true}

Did you expect to see some different?

A working test.

Maybe fix?

I think I had similar issues with Github Actions. Could be multiple things:

  1. I remember that essentially the Docker in Docker failed because of the filesystem. You can't have overlay on overlay. One of them has to be vfs for example.
  2. Or it's buggy Docker?
  3. Or it's Travis-CI.

Should we try to fix this?

Preflight checks

Fail when at least one collector is in both lists node_exporter_enabled_collectors and node_exporter_disabled_collectors

Task "Allow Node Exporter port in SELinux on RedHat OS family" fails on centos 7

What happened?
Role apply over a fresh centos7 vagrant box fails with:

An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ImportError: No module named seobject
fatal: [centos7]: FAILED! => {"changed": false, "msg": "Failed to import the required Python library (policycoreutils-python) on localhost.localdomain's Python /usr/bin/python. Please read module documentation and install in the appropriate location. If the required library is installed, but Ansible is using the wrong Python interpreter, please consult the documentation on ansible_python_interpreter"}

Did you expect to see some different?

The role should install missing python dependecies (policycoreutils-python). Role works with no problem on cenots8.

How to reproduce it (as minimally and precisely as possible):

Environment

  • Role version:

    commit: bde46a6273283b22d1ef7277ebbcd390b3680825

  • Ansible version information:

    ansible 2.9.7`
    ansible python module location = /usr/lib/python2.7/dist-packages/ansible
    executable location = /usr/bin/ansible
    python version = 2.7.17 (default, Apr 15 2020, 17:20:14) [GCC 7.5.0]
    

Checksum error

What happened?
Can't download node_exporter binary due to invalid checksum

Did you expect to see some different?
Install ok

How to reproduce it (as minimally and precisely as possible):

Environment

  • Role version:

    0.19.0

  • Ansible version information:

ansible --version
ansible 2.7.10
  config file = /home/nicolas/Work/campings/sites/rundeck/ansible/ansible.cfg
  configured module search path = ['/home/nicolas/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /home/nicolas/Applications/azure-cli/venv/lib/python3.8/site-packages/ansible
  executable location = /home/nicolas/Applications/azure-cli/venv/bin/ansible
  python version = 3.8.1 (default, Jan 22 2020, 06:38:00) [GCC 9.2.0]
  • Variables:
---
node_exporter_version: 0.18.1
  • Ansible playbook execution Logs:
TASK [cloudalchemy_node_exporter : Download node_exporter binary to local folder] **********************************************************************************************************************************
task path: /home/nicolas/Work/campings/sites/rundeck/ansible/roles/cloudalchemy_node_exporter/tasks/install.yml:21                                                                                                  
Using module file /home/nicolas/Applications/azure-cli/venv/lib/python3.8/site-packages/ansible/modules/net_tools/basics/get_url.py                                                                                 
<localhost> ESTABLISH LOCAL CONNECTION FOR USER: nicolas                                                                                                                                                            
<localhost> EXEC /bin/sh -c '/home/nicolas/Applications/azure-cli/venv/bin/python3 && sleep 0'                                                                                                                      
FAILED - RETRYING: Download node_exporter binary to local folder (5 retries left).Result was: {                                                                                                                     
    "attempts": 1,                                                                                                                                                                                                  
    "changed": false,                                                                                                                                                                                               
    "invocation": {                                                                                                                                                                                                 
        "module_args": {                                                                                                                                                                                            
            "attributes": null,                                                                                                                                                                                     
            "backup": null,                                                                                                                                                                                         
            "checksum": "sha256:['61a13b13f5a98bafd6e0dec17c6579acbc13f8a1e24a8e9206a8017edb248460",                                                                                                                
            "client_cert": null,                                                                                                                                                                                    
            "client_key": null,                                                                                                                                                                                     
            "content": null,                                                                                                                                                                                        
            "delimiter": null,                                                                                                                                                                                      
            "dest": "/tmp/node_exporter-0.18.1.linux-amd64.tar.gz",                                                                                                                                                 
            "directory_mode": null,                                                                                                                                                                                 
            "follow": false,                                                                                                                                                                                        
            "force": false,                                                                                                                                                                                         
            "force_basic_auth": false,                                                                                                                                                                              
            "group": null,                                                                                                                                                                                          
            "headers": null,                                                                                                                                                                                        
            "http_agent": "ansible-httpget",                                                                                                                                                                        
            "mode": null,
            "owner": null,
            "regexp": null,
            "remote_src": null,
            "selevel": null,
            "serole": null,
            "setype": null,
            "seuser": null,
            "sha256sum": "",
            "src": null,
            "timeout": 10,
            "tmp_dest": null,
            "unsafe_writes": null,
            "url": "https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz",
            "url_password": null,
            "url_username": null,
            "use_proxy": true,
            "validate_certs": true
        }
    },
    "msg": "The checksum for /tmp/node_exporter-0.18.1.linux-amd64.tar.gz did not match 61a13b13f5a98bafd6e0dec17c6579acbc13f8a1e24a8e9206a8017edb248460; it was b2503fd932f85f4e5baf161268854bf5d22001869b84f00fd2d
1f57b51b72424.",
    "retries": 6
}

Anything else we need to know?:

Debug code :

- block:
    - name: Get checksum list from github
      set_fact:
        _checksums: "{{ lookup('url', 'https://github.com/prometheus/node_exporter/releases/download/v' + node_exporter_version + '/sha256sums.txt', wantlist=True) | list }}"
      run_once: true

    - name: "Get checksum for {{ go_arch }} architecture"
      set_fact:
        node_exporter_checksum: "{{ item.split(' ')[0] }}"
      with_items: "{{ _checksums }}"
      when:
        - "('linux-' + go_arch + '.tar.gz') in item"

    - name: tgz archive
      debug:
        var: "('linux-' + go_arch + '.tar.gz')"

    - name: checksum
      debug:
        var: "node_exporter_checksum"

Logs :

TASK [cloudalchemy_node_exporter : Get checksum for amd64 architecture] ********************************************************************************************************************************************
task path: /home/nicolas/Work/campings/sites/rundeck/ansible/roles/cloudalchemy_node_exporter/tasks/preflight.yml:81                                                                                                
ok: [campings-sites-qa-rundeck] => (item=['61a13b13f5a98bafd6e0dec17c6579acbc13f8a1e24a8e9206a8017edb248460  node_exporter-0.18.1.darwin-386.tar.gz', '20fadb3108de0a9cc70a1333394e5be90416b4f91025f9fc66f5736335e94
398  node_exporter-0.18.1.darwin-amd64.tar.gz', 'a6c7eb64bb5f27a5567d545a1b93780f3aa72d0627751fd9f054626bb542a4b5  node_exporter-0.18.1.linux-386.tar.gz', 'b2503fd932f85f4e5baf161268854bf5d22001869b84f00fd2d1f57b
51b72424  node_exporter-0.18.1.linux-amd64.tar.gz', 'd5a28c46e74f45b9f2158f793a6064fd9fe8fd8da6e0d1e548835ceb7beb1982  node_exporter-0.18.1.linux-arm64.tar.gz', '1eecbaa2a7e58dc2a5b18e960c48842e5e158c9e2eea4d8a4b
a32b98ca2f638a  node_exporter-0.18.1.linux-armv5.tar.gz', '6f3cb593c15c12cdfaef20d7e1c61d28ef822af6fc8c85d670cb3f0a1212778a  node_exporter-0.18.1.linux-armv6.tar.gz', '5de85067f44b42b22d62b2789cb1a379ff5559649b99795cd1ba0c144b512ad0  node_exporter-0.18.1.linux-armv7.tar.gz', '9ef7c932970bc823a63347c3cdd8a34a4ef9d327cd5513600435dfd74d046755  node_exporter-0.18.1.linux-mips.tar.gz', 'c2721c1b85e3024e61f37fb2dc44a57f6d4eed8
cc0576185a1dedea20e36fb31  node_exporter-0.18.1.linux-mips64.tar.gz', 'ae262af96dd7409aeefe28f8ea6cb1b00377444837057ed67694d8fa1b75b848  node_exporter-0.18.1.linux-mips64le.tar.gz', '40860be242f563e3e10972685f1d1
654c9b5ca9686b26bde4a422f57a1ebdd18  node_exporter-0.18.1.linux-mipsle.tar.gz', 'b41f860dbe23b72cf2ae939dd6bb43ea3ddde268f5a964cf6f8d490fed1ed034  node_exporter-0.18.1.linux-ppc64.tar.gz', '27996a62327e07041b5dd2
f09d6054c7c21244e39358da5d9b44b96daf6a2bc0  node_exporter-0.18.1.linux-ppc64le.tar.gz', '0bc212b9db6c2201b2b38d46de2d4cc75b7f4648d7616a87d7616e85f0d6cba4  node_exporter-0.18.1.linux-s390x.tar.gz', 'c831801b573075
0177893a9866416ebb68977a8fd5a7b5305e39ef1162e146a9  node_exporter-0.18.1.netbsd-386.tar.gz', '4772c8e2d13935d2bcfa8ad1fd64b8ca5d2cc5d71bbee6dd4ef04306017c6368  node_exporter-0.18.1.netbsd-amd64.tar.gz']) => {    
    "ansible_facts": {                                                                                                                                                                                              
        "node_exporter_checksum": "['61a13b13f5a98bafd6e0dec17c6579acbc13f8a1e24a8e9206a8017edb248460"                                                                                                              
    },                                                                                                                                                                                                              
    "changed": false,                                                                                                                                                                                               
    "item": "['61a13b13f5a98bafd6e0dec17c6579acbc13f8a1e24a8e9206a8017edb248460  node_exporter-0.18.1.darwin-386.tar.gz', '20fadb3108de0a9cc70a1333394e5be90416b4f91025f9fc66f5736335e94398  node_exporter-0.18.1.da
rwin-amd64.tar.gz', 'a6c7eb64bb5f27a5567d545a1b93780f3aa72d0627751fd9f054626bb542a4b5  node_exporter-0.18.1.linux-386.tar.gz', 'b2503fd932f85f4e5baf161268854bf5d22001869b84f00fd2d1f57b51b72424  node_exporter-0.18
.1.linux-amd64.tar.gz', 'd5a28c46e74f45b9f2158f793a6064fd9fe8fd8da6e0d1e548835ceb7beb1982  node_exporter-0.18.1.linux-arm64.tar.gz', '1eecbaa2a7e58dc2a5b18e960c48842e5e158c9e2eea4d8a4ba32b98ca2f638a  node_exporte
r-0.18.1.linux-armv5.tar.gz', '6f3cb593c15c12cdfaef20d7e1c61d28ef822af6fc8c85d670cb3f0a1212778a  node_exporter-0.18.1.linux-armv6.tar.gz', '5de85067f44b42b22d62b2789cb1a379ff5559649b99795cd1ba0c144b512ad0  node_e
xporter-0.18.1.linux-armv7.tar.gz', '9ef7c932970bc823a63347c3cdd8a34a4ef9d327cd5513600435dfd74d046755  node_exporter-0.18.1.linux-mips.tar.gz', 'c2721c1b85e3024e61f37fb2dc44a57f6d4eed8cc0576185a1dedea20e36fb31  n
ode_exporter-0.18.1.linux-mips64.tar.gz', 'ae262af96dd7409aeefe28f8ea6cb1b00377444837057ed67694d8fa1b75b848  node_exporter-0.18.1.linux-mips64le.tar.gz', '40860be242f563e3e10972685f1d1654c9b5ca9686b26bde4a422f57a
1ebdd18  node_exporter-0.18.1.linux-mipsle.tar.gz', 'b41f860dbe23b72cf2ae939dd6bb43ea3ddde268f5a964cf6f8d490fed1ed034  node_exporter-0.18.1.linux-ppc64.tar.gz', '27996a62327e07041b5dd2f09d6054c7c21244e39358da5d9b
44b96daf6a2bc0  node_exporter-0.18.1.linux-ppc64le.tar.gz', '0bc212b9db6c2201b2b38d46de2d4cc75b7f4648d7616a87d7616e85f0d6cba4  node_exporter-0.18.1.linux-s390x.tar.gz', 'c831801b5730750177893a9866416ebb68977a8fd5
a7b5305e39ef1162e146a9  node_exporter-0.18.1.netbsd-386.tar.gz', '4772c8e2d13935d2bcfa8ad1fd64b8ca5d2cc5d71bbee6dd4ef04306017c6368  node_exporter-0.18.1.netbsd-amd64.tar.gz']"                                     
}                                                                                                                                                                                                                   
Read vars_file '../env_vars/{{ env }}.yml'

TASK [cloudalchemy_node_exporter : tgz archive] ********************************************************************************************************************************************************************
task path: /home/nicolas/Work/campings/sites/rundeck/ansible/roles/cloudalchemy_node_exporter/tasks/preflight.yml:88
ok: [campings-sites-qa-rundeck] => {
    "('linux-' + go_arch + '.tar.gz')": "linux-amd64.tar.gz"
}
Read vars_file '../env_vars/{{ env }}.yml'

TASK [cloudalchemy_node_exporter : checksum] ***********************************************************************************************************************************************************************
task path: /home/nicolas/Work/campings/sites/rundeck/ansible/roles/cloudalchemy_node_exporter/tasks/preflight.yml:92
ok: [campings-sites-qa-rundeck] => {
    "node_exporter_checksum": "['61a13b13f5a98bafd6e0dec17c6579acbc13f8a1e24a8e9206a8017edb248460"
}
Read vars_file '../env_vars/{{ env }}.yml'

Role seems to set the Darwin checksum.

Support for darwin (macOS)

Would it be possible to add support for darwin/macOS?

I just switched to using this role for our Linux based hosts; but we have a quite substantial amount of nodes that are running darwin as well.

Role fails to unpack node_exporter binary

What happened?
Role fails to unpack node_exporter binary because of file permissions.

Maybe related to this security fix in ansible: ansible/ansible#67794

Did you expect to see some different?
I was expecting the installation to succeed as with previous versions of ansible.

How to reproduce it (as minimally and precisely as possible):
Use version of ansible = 2.9.12.

Environment

  • Role version:

    0.21.3

  • Ansible version information:

    2.9.12

  • Variables:

insert role variables relevant to the issue
  • Ansible playbook execution Logs:
    qemu: TASK [node-exporter : Download node_exporter binary to local folder] ***********
    qemu: changed: [localhost]
==> qemu: [WARNING]: File '/tmp/node_exporter-1.0.1.linux-amd64.tar.gz' created with
==> qemu: default permissions '600'. The previous default was '666'. Specify 'mode' to
==> qemu: avoid this warning.
    qemu:
    qemu: TASK [node-exporter : Unpack node_exporter binary] *****************************
    qemu: fatal: [localhost]: FAILED! => {"changed": false, "msg": "an error occurred while trying to read the file '/tmp/node_exporter-1.0.1.linux-amd64.tar.gz': [Errno 13] Permission denied: '/tmp/node_exporter-1.0.1.linux-amd64.tar.gz'"}

Problems with `node_exporter_textfile_dir`

When node_exporter_textfile_dir is changed to a different directory, but node_exporter_enabled_collectors is set to default, node_exporter have wrong configuration in systemd file. This is probably caused by the fact that configuration of node_exporter_enabled_collectors is taking the default node_exporter_textfile_dir value and not the overwitten one.

Possible fixes:

  1. Use static node_exporter_textfile_dir and don't allow any changes
  2. Detect if node_exporter_textfile_dir has different value than default one and change node_exporter_enabled_collectors accordingly.
  3. Detect if node_exporter_textfile_dir has different value than default one and fail role with a user notification to accomodate this change in custom node_exporter_enabled_collectors.
  4. Other

Either way we should probably check for this issue in tasks/preflight.yml since that's what this file is for.

cc: @SuperQ

Issue running the Role

When i tried to apply the role:

# ------------------------------------------------------------------------------
# Node Exporter Service
# ------------------------------------------------------------------------------
- hosts:
    - node-exporters
  roles:
    - role: cloudalchemy.node-exporter
      node_exporter_version: 0.17.0
      node_exporter_enabled_collectors:
        - supervisord
        - systemd
        - tcpstat
        - textfile

# ------------------------------------------------------------------------------
# Firewall Rules
# ------------------------------------------------------------------------------
- hosts:
    - node-exporters
  tasks:
    - name: Open 9100/tcp port in the firewall
      firewalld:
        port: 9100/tcp
        permanent: yes
        immediate: yes
        state: enabled
      when: ansible_os_family == 'RedHat'

i got this error

<localhost> EXEC /bin/sh -c 'chmod u+x /Users/ppadial/.ansible/tmp/ansible-tmp-1554487030.581331-239264549754786/ /Users/ppadial/.ansible/tmp/ansible-tmp-1554487030.581331-239264549754786/AnsiballZ_get_url.py && sleep 0'
<localhost> EXEC /bin/sh -c 'sudo -H -S  -p "[sudo via ansible, key=scrsjkfwaegntxilsjtpmzdkousxbdah] password: " -u root /bin/sh -c '"'"'echo BECOME-SUCCESS
 /usr/local/Cellar/ansible/2.7.9/libexec/bin/python3.7 /Users/ppadial/.ansible/tmp/ansible-tmp-1554487030.581331-239264549754786/AnsiballZ_get_url.py'"'"' && sleep 0'
<localhost> EXEC /bin/sh -c 'echo ~ppadial && sleep 0'
<localhost> EXEC /bin/sh -c 'rm -f -r /Users/ppadial/.ansible/tmp/ansible-tmp-1554487028.519246-130314029194652/ > /dev/null 2>&1 && sleep 0'
<localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /Users/ppadial/.ansible/tmp/ansible-tmp-1554487032.687249-126806156839976 `" && echo ansible-tmp-1554487032.687249-126806156839976="` echo /Users/ppadial/.ansible/tmp/ansible-tmp-1554487032.687249-126806156839976 `" ) && sleep 0'
FAILED - RETRYING: Download node_exporter binary to local folder (3 retries left).Result was: {
    "attempts": 3,
    "changed": false,
    "module_stderr": "Sorry, try again.\n[sudo via ansible, key=akey] password: \nsudo: 1 incorrect password attempt\n",
    "module_stdout": "",
    "msg": "MODULE FAILURE\nSee stdout/stderr for the exact error",
    "rc": 1,
    "retries": 6
}

Any ideas?

Nice'ness fails node_exporter

Hello!

Been trying to use this role to install the node exporter on a few servers (VMs).

Ultimately, the Ansible run completes, but the node exporter is not started.

When inspecting systemctl status I see the job has reached its start limit, the Journal has this:

Oct 25 11:11:03 prometheus.test systemd[14341]: Failed at step NICE spawning /usr/local/bin/node_exporter: Permission denied
Oct 25 11:11:03 prometheus.test systemd[1]: node_exporter.service: main process exited, code=exited, status=201/NICE
Oct 25 11:11:03 prometheus.test systemd[1]: Unit node_exporter.service entered failed state.
Oct 25 11:11:03 prometheus.test systemd[1]: node_exporter.service failed.
Oct 25 11:11:03 prometheus.test systemd[1]: node_exporter.service holdoff time over, scheduling restart.
Oct 25 11:11:03 prometheus.test systemd[1]: Started Prometheus Node Exporter.
Oct 25 11:11:03 prometheus.test systemd[1]: Starting Prometheus Node Exporter...
...

I tried starting the node exporter as root, which works, it just fails at nice'ing the process via Systemd when using the user node-exp (which is also setup by this role).

I have tried to google how to check which permissions are needed by a user, but I am also stuck checking Systemd internals to see what is happening.

I think I am not doing something very custom hereย โ€” running this role, a few configuration settings:

  vars:
    node_exporter_web_listen_address: "0.0.0.0:9100"
    node_exporter_textfile_dir: "/var/lib/node_exporter"
    node_exporter_disabled_collectors:
      - diskstats
      - mdadm
      - nfs
      - nfsd
      - wifi
      - xfs
      - zfs

I noticed the node_exporter.service installed by this role is the only one nice'ing the service. All other examples/tutorials etc. don't show that. I suspect there is good reason to do this, thus raising the issue here.

upstream bug in to_nice_yaml affects template generation task: "value must be a string"

What happened?

Task "Copy the node_exporter config file" fails with the following error:

    "msg": "AnsibleError: Unexpected templating type error occurred on (---\n{{ ansible_managed | comment }}\n{% if node_exporter_tls_server_config | length > 0 %}\ntls_server_config:\n{{ node_exporter_tls_server_config | to_nice_yaml | indent(2, true) }}\n{% endif %}\n\n{% if node_exporter_http_server_config | length > 0 %}\nhttp_server_config:\n{{ node_exporter_http_server_config | to_nice_yaml | indent(2, true) }}\n{% endif %}\n\n{% if node_exporter_basic_auth_users | length > 0 %}\nbasic_auth_users:\n{% for k, v in node_exporter_basic_auth_users.items() %}\n  {{ k }}: {{ v | password_hash('bcrypt', ('abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ1234567890' | shuffle(seed=inventory_hostname) | join)[:22], rounds=9) }}\n{% endfor %}\n{% endif %}\n): value must be a string"

Removing
| to_nice_yaml | indent(2, true)
from the config.yaml.j2 template
avoids the error. Resulting config:

tls_server_config:
{'cert_file': '/etc/node_exporter/tls.cert', 'key_file': '/etc/node_exporter/tls.key'}

I suspect this is related to:
ansible/ansible#66916

Did you expect to see some different?

config.yaml is generated without errors.

How to reproduce it (as minimally and precisely as possible):

  • ensure key and cert files are in place

used playbook:

- hosts: server
  roles:
    - ansible-node-exporter
  vars:
    node_exporter_binary_local_dir: /home/user/prometheus_bin/node_exporter
    node_exporter_tls_server_config:
      cert_file: '/etc/node_exporter/tls.cert'
      key_file: '/etc/node_exporter/tls.key

ensure your ansible version matches the one bellow

Environment

  • Role version:

0.21.0

  • Ansible version information:
ansible 2.9.9
  config file = /home/user/.ansible.cfg
  configured module search path = ['/home/user/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python3.7/site-packages/ansible
  executable location = /usr/bin/ansible
  python version = 3.7.7 (default, Mar 13 2020, 10:23:39) [GCC 9.2.1 20190827 (Red Hat 9.2.1-1)]

python3-pyyaml-5.3.1-1.fc31.x86_64

Hardcode node-exporter username and group

Hello

In all cloudalchemy roles username and group are hard coded.
In this role username and group are freely definable.
Is there a reason for this?
I think it should be the same style in all roles for convenient reasons.
Either username and group are hard coded or freely definable.
Do we have an opinion on this?

Greetings

Socket activation

What if we use built-in feature of systemd named "socket activation" for starting node-exporter?

This way at boot node_exporter wouldn't be started, but systemd would invoke it on first prometheus scrape. End result is one process less when it is not used.

@SuperQ what do you think about it?

Failed to get information on remote file

Trying to install node-exporter on two servers:

roles:
- cloudalchemy.node-exporter

and get strange error:

TASK [cloudalchemy.node-exporter : Download Node exporter binary to local folder] ****************************************************************************************************
FAILED - RETRYING: Download Node exporter binary to local folder (5 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (5 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (4 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (4 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (3 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (3 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (2 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (2 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (1 retries left).
FAILED - RETRYING: Download Node exporter binary to local folder (1 retries left).
fatal: [runner-1 -> localhost]: FAILED! => changed=false 
  attempts: 5
  module_stderr: |-
    /bin/sh: /usr/bin/python2: No such file or directory
  module_stdout: ''
  msg: |-
    The module failed to execute correctly, you probably need to set the interpreter.
    See stdout/stderr for the exact error
  rc: 127
fatal: [runner-2 -> localhost]: FAILED! => changed=false 
  attempts: 5
  module_stderr: |-
    /bin/sh: /usr/bin/python2: No such file or directory
  module_stdout: ''
  msg: |-
    The module failed to execute correctly, you probably need to set the interpreter.
    See stdout/stderr for the exact error
  rc: 127

Server 1: Linux runner-2 4.9.0-7-amd64 #1 SMP Debian 4.9.110-1 (2018-07-05) x86_64 GNU/Linux
Server 2: Linux runner 4.15.0-34-generic #37-Ubuntu SMP Mon Aug 27 15:21:48 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Localhost: Linux localhost.localdomain 4.18.14-200.fc28.x86_64 #1 SMP Mon Oct 15 13:16:27 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

There is /tmp folder and an /usr/bin/python2 on both servers and on the localhost.

What's wrong?

node_exporter can not be updated when node_exporter_binary_local_dir is set

What happened?

After changing

node_exporter_binary_local_dir: node_exporter-1.0.0.linux-amd64

to

node_exporter_binary_local_dir: node_exporter-1.0.1.linux-amd64

I somewhat expected the role to copy the new version to the servers,
but install.yml is ignored after the first playbook run and the version on the servers is not updated.

Moving the task "propagate locally distributed node_exporter binary" from install.yml to main.yml right after the install.yml import would solve this.

How to reproduce it (as minimally and precisely as possible):
run this playbook:

- hosts: server
  roles:
    - ansible-node-exporter
  vars:
    node_exporter_binary_local_dir: /home/user/prometheus_bin/node_exporter-1.0.0.linux-amd64

then run this playbook:

- hosts: server
  roles:
    - ansible-node-exporter
  vars:
    node_exporter_binary_local_dir: /home/user/prometheus_bin/node_exporter-1.0.1.linux-amd64

Environment

  • Role version:

0.21.0

  • Ansible playbook execution Logs:

Task is skipped on second run since the node_exporter file exists on the server already:

TASK [ansible-node-exporter : Propagate node_exporter binaries] ********************************************************************************************
skipping: [server]

Quoting the collector parameter ignored-mount-points produces unwanted results

What happened?

Setting the filesystem collector parameter ignored-mount-points puts the value in quotes to the service file,
which is fed verbatimly into regexp parser and makes the expression "useless".

Is there some way to work around this obstacle?
(detailed description below)

Did you expect to see some different?

As per these two node exporter issues quoting the regexp in service file is a wrong thing to do:

prometheus/node_exporter#911 (comment)
prometheus/node_exporter#1000 (comment)

How to reproduce it (as minimally and precisely as possible):

Use the example from the defaults/main.yml file:

node_exporter_enabled_collectors:
- filesystem:
    ignored-mount-points: "^/(sys|proc|dev)($|/)"

Produces the .service file with execstart line:

ExecStart=/usr/local/bin/node_exporter \
--collector.textfile \
    --collector.textfile.directory=/opt/prometheus_data \
--collector.filesystem \
    --collector.filesystem.ignored-mount-points='^/(sys|proc|dev)($|/)' \
    --web.listen-address=0.0.0.0:9100

Result:

# curl -s localhost:9100/metrics|grep 'mountpoint="/sys'
node_filesystem_avail_bytes{device="tmpfs",fstype="tmpfs",mountpoint="/sys/fs/cgroup"} 9.0760503296e+10
node_filesystem_device_error{device="tmpfs",fstype="tmpfs",mountpoint="/sys/fs/cgroup"} 0
node_filesystem_files{device="tmpfs",fstype="tmpfs",mountpoint="/sys/fs/cgroup"} 2.2158326e+07
node_filesystem_files_free{device="tmpfs",fstype="tmpfs",mountpoint="/sys/fs/cgroup"} 2.215831e+07
node_filesystem_free_bytes{device="tmpfs",fstype="tmpfs",mountpoint="/sys/fs/cgroup"} 9.0760503296e+10
node_filesystem_readonly{device="tmpfs",fstype="tmpfs",mountpoint="/sys/fs/cgroup"} 1
node_filesystem_size_bytes{device="tmpfs",fstype="tmpfs",mountpoint="/sys/fs/cgroup"} 9.0760503296e+10

If the quotes are manually removed from the switch line:

--collector.filesystem.ignored-mount-points=^/(sys|proc|dev)($|/) \

then the result is as expected (mountpoint is ignored):

# systemctl daemon-reload
# systemctl restart node_exporter.service
# curl -s localhost:9100/metrics|grep 'mountpoint="/sys'
#
  • Role version:

    version: 0.21.5

  • Ansible version information:

ansible --version
ansible 2.8.2
  config file = /root/src/clip-hpc/ansible.cfg
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /root/src/clip-hpc-venv/lib/python3.6/site-packages/ansible
  executable location = /root/src/clip-hpc-venv/bin/ansible
  python version = 3.6.8 (default, Dec  5 2019, 15:45:45) [GCC 8.3.1 20191121 (Red Hat 8.3.1-5)]

Check is not working

Hi,

--check is not working with this playbook

fatal: [vpnssl-d01-mon]: FAILED! => {"msg": "The conditional check '(not __node_exporter_is_installed.stat.exists) or (__node_exporter_current_version_output.stderr_lines[0].split(\" \")[2] != node_exporter_version)' failed. The error was: error while evaluating conditional ((not __node_exporter_is_installed.stat.exists) or (__node_exporter_current_version_output.stderr_lines[0].split(\" \")[2] != node_exporter_version)): 'dict object' has no attribute 'stderr_lines'\n\nThe error appears to be in '/home/bdupuis/git/vpnssl/ansible/roles/cloudalchemy.node-exporter/tasks/install.yml': line 2, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n---\n- name: Install dependencies\n  ^ here\n"}

I think we should add "check_mode: no" on "Check if node_exporter is installed" and other register command

Best regards

Role tasks mess up the entire system's permissions

It looks like under certain circumstances, this role seems to be wreaking havoc where it is provisioned. It changes the permissions of the entire root folder (!?).

Example output:

TASK [cloudalchemy.node-exporter : Install dependencies] *********************************************************************************************************************

TASK [cloudalchemy.node-exporter : Create the node_exporter group] ***********************************************************************************************************
changed: [the_vm_ip]

TASK [cloudalchemy.node-exporter : Create the node_exporter user] ************************************************************************************************************
fatal: [the_vm_ip]: FAILED! => {"changed": false, "msg": "[Errno 1] Operation not permitted: '/proc/sys'"}

After this failure, I SSH'd into the system and found this....

system-username@vm-hostname:~$ ls -la /
total 88
drwxr-xr-x  23 node-exp users  4096 Oct 16 12:43 .
drwxr-xr-x  23 node-exp users  4096 Oct 16 12:43 ..
drwxr-xr-x   2 node-exp users  4096 Oct 10 19:31 bin
drwxr-xr-x   3 node-exp users  4096 Oct 10 19:32 boot
drwxr-xr-x  16 node-exp users  3580 Oct 16 12:43 dev
drwxr-xr-x 103 node-exp users  4096 Oct 16 13:01 etc
drwxr-xr-x  18 node-exp users  4096 Oct 16 12:59 home
lrwxrwxrwx   1 root     root     31 Oct 10 19:32 initrd.img -> boot/initrd.img-4.15.0-1046-gcp
lrwxrwxrwx   1 root     root     31 Oct 10 19:32 initrd.img.old -> boot/initrd.img-4.15.0-1046-gcp
drwxr-xr-x  20 node-exp users  4096 Oct 16 12:51 lib
drwxr-xr-x   2 node-exp users  4096 Oct 10 19:29 lib64
drwx------   2 node-exp users 16384 Oct 10 19:31 lost+found
drwxr-xr-x   2 node-exp users  4096 Oct 10 19:29 media
drwxr-xr-x   2 node-exp users  4096 Oct 10 19:29 mnt
drwxr-xr-x   2 node-exp users  4096 Oct 10 19:29 opt
dr-xr-xr-x 717 node-exp users     0 Oct 16 12:43 proc
drwx------   4 node-exp users  4096 Oct 16 12:51 root
drwxr-xr-x  23 node-exp users   940 Oct 16 13:02 run
drwxr-xr-x   2 node-exp users  4096 Oct 10 19:31 sbin
drwxr-xr-x   2 node-exp users  4096 Oct 16 12:43 snap
drwxr-xr-x   2 node-exp users  4096 Oct 10 19:29 srv
dr-xr-xr-x  13 node-exp users     0 Oct 16 12:48 sys
drwxrwxrwt   8 node-exp users  4096 Oct 16 13:01 tmp
drwxr-xr-x  10 node-exp users  4096 Oct 10 19:29 usr
drwxr-xr-x  14 node-exp users  4096 Oct 16 12:51 var
lrwxrwxrwx   1 root     root     28 Oct 10 19:32 vmlinuz -> boot/vmlinuz-4.15.0-1046-gcp
lrwxrwxrwx   1 root     root     28 Oct 10 19:32 vmlinuz.old -> boot/vmlinuz-4.15.0-1046-gcp

The above is a VM running in GCP. I have another VM running in GCP where I have run the same version of the role against it and this did not happen.

This is the playbook where this is happening:

# Playbook where the issue happens
---
- hosts: "{{ hosts_group }}"
  gather_facts: true
  become: true
  roles:
    - role: lifeofguenter.oracle-java
      become: yes
    - role: jobscore.beats
      become: yes
    - role: torian.logstash
       become: yes
    - role: cloudalchemy.node-exporter

And this is the playbook where this does not happen:

# Playbook where problem does not occur
---
- hosts: "{{ hosts_group }}"
  gather_facts: yes
  roles:
    - role: jobscore.beats
      become: yes
    - role: torian.logstash
      become: yes
    - role: cloudalchemy.node-exporter

And this is the requirements.yml file used in both projects:

---
- src: https://github.com/jobscore/ansible-role-beats/archive/v0.1.1.tar.gz
  name: jobscore.beats
- src: https://github.com/torian/ansible-role-logstash/archive/1.2.0.tar.gz
  name: torian.logstash
- src: https://github.com/lifeofguenter/ansible-role-oracle-java/archive/1.0.2.tar.gz
  name: lifeofguenter.oracle-java
- src: https://github.com/cloudalchemy/ansible-node-exporter/archive/0.15.0.tar.gz
  name: cloudalchemy.node-exporter

The only visible difference is the become: true defined in the playbook where this happens. But still, why would the role change the permissions of the entire system? ๐Ÿค”

Problem downloading sha256sum.txt from github

What happened?
When running the ansible-node-exporter, upon running this task:

    - name: Get checksum list from github
      set_fact:
        _checksums: "{{ lookup('url', 'https://github.com/prometheus/node_exporter/releases/download/v' + node_exporter_version + '/sha256sums.txt', wantlist=True) | list }}"
      run_once: true

I get the following issue.

[19:01:16] cloudalchemy.node-exporter : Get checksum list from github | host | FAILED | 1742ms
{

It seemed that the results return is a 302, which the url did not follow. The option to follow seemed to be added in Ansible Devel. Switching to use uri worked.

Did you expect to see some different?

Expect this to succeed.

How to reproduce it (as minimally and precisely as possible):
Attempt to run/use the role.

Environment
Ubuntu 18.04

  • Role version:
https://github.com/cloudalchemy/ansible-node-exporter/releases/tag/0.19.0
  • Ansible version information:
root@c86f70a56fc7:/work# ansible --version
ansible 2.7.1
  config file = None
  configured module search path = ['/root/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.6/dist-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 3.6.9 (default, Nov  7 2019, 10:44:02) [GCC 8.3.0]
  • Variables:
https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/preflight.yml#L78
  • Ansible playbook execution Logs:
[19:01:16] cloudalchemy.node-exporter : Get checksum list from github | host | FAILED | 1742ms
{
  - msg: An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/v0.18.1/sha256sums.txt : HTTP Error 400: Bad Request

Anything else we need to know?:
Work around at the moment is to patch the task with

    - name: Download checksum list from github
      uri:
        url: "{{ 'https://github.com/prometheus/node_exporter/releases/download/v' + node_exporter_version + '/sha256sums.txt' }}"
        method: GET
        return_content: true
        status_code: 200
        body_format: json
      register: _raw_checksum
      until: _raw_checksum.status == 200
      retries: 5
      run_once: true

    - name: "Get checksum list from github results"
      set_fact:
        _checksums: "{{ _raw_checksum.content.split('\n') }}"
      run_once: true

disruptive role name change on galaxy

Commit 0d9f503 changed the role name on Ansible Galaxy from node-exporter to node_exporter. I understand that role names with dashes are deprecated, but this change is disruptive and I don't even see it mentioned in the commit message! This is not only breaking projects using latest version: even those pinned to a release tag are now failing to download the role. Documentation is now out of sync as well. What's the maintainer's opinion on this?

Optimizations

(Sorry, for lack of better headline/title...)

I had a question about this role, specifically about:
https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/install.yml#L24-L35

Is there a specific reason this is delegated to localhost? It seems downloading and unpacking on the target node would be much better in most cases?

Then, this part:
https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/install.yml#L52-L60

Seems to happen almost every time. Can we add a "creates" and an initial version check? Any thoughts? I am asking because this setup seems to consume a substantial amount of time on my ansible runs and I am trying to avoid building special purpose playbooks right now and would prefer to be able to re-run this continuously/whenever with the same predicable outcome.

Installation fails

Installation fails in Ansible AWX 9.1.0 Environment

/tmp/node_exporter-0.18.1.linux-amd64.tar.gz => exists on the target node after run

TASK [cloudalchemy.node-exporter : Download node_exporter binary to local folder] ***
ok: [elab-leaf1 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-spine2 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-spine1 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-leaf3 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-leaf4 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-leaf2 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-extleaf1 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-egress-leaf1 -> {{ inventory_hostname }}.{{ host_domain }}]
ok: [elab-egress-leaf2 -> {{ inventory_hostname }}.{{ host_domain }}]
TASK [cloudalchemy.node-exporter : Unpack node_exporter binary] ****************
fatal: [elab-spine2 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-leaf1 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-spine1 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-leaf3 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-leaf4 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-leaf2 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-extleaf1 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-egress-leaf1 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
fatal: [elab-egress-leaf2 -> {{ inventory_hostname }}.{{ host_domain }}]: FAILED! => {"changed": false, "msg": "Could not find or access '/tmp/node_exporter-0.18.1.linux-amd64.tar.gz' on the Ansible Controller.\nIf you are using a module and expect the file to exist on the remote, see the remote_src option"}
{
    "group": "ansible-cumulus",
    "uid": 999,
    "url": "https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz",
    "changed": false,
    "elapsed": 0,
    "dest": "/tmp/node_exporter-0.18.1.linux-amd64.tar.gz",
    "state": "file",
    "gid": 997,
    "mode": "0644",
    "invocation": {
        "module_args": {
            "directory_mode": null,
            "force": false,
            "remote_src": null,
            "path": "/tmp/node_exporter-0.18.1.linux-amd64.tar.gz",
            "owner": null,
            "follow": false,
            "client_key": null,
            "group": null,
            "use_proxy": true,
            "unsafe_writes": null,
            "serole": null,
            "content": null,
            "validate_certs": true,
            "setype": null,
            "client_cert": null,
            "timeout": 10,
            "url_password": null,
            "dest": "/tmp/node_exporter-0.18.1.linux-amd64.tar.gz",
            "selevel": null,
            "force_basic_auth": false,
            "sha256sum": "",
            "http_agent": "ansible-httpget",
            "regexp": null,
            "src": null,
            "url": "https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-amd64.tar.gz",
            "checksum": "sha256:b2503fd932f85f4e5baf161268854bf5d22001869b84f00fd2d1f57b51b72424",
            "seuser": null,
            "headers": null,
            "delimiter": null,
            "mode": null,
            "url_username": null,
            "attributes": null,
            "backup": null,
            "tmp_dest": null
        }
    },
    "owner": "ansible-cumulus",
    "checksum_src": null,
    "size": 8083296,
    "checksum_dest": null,
    "msg": "file already exists",
    "_ansible_no_log": false,
    "attempts": 1,
    "_ansible_delegated_vars": {
        "ansible_host": "{{ inventory_hostname }}.{{ host_domain }}"
    }
}

Python crashes when running role

I'm not sure if this is related to this particular role, or perhaps Python, but I thought it would be a good idea to document the bug in the event anyone else runs into this, and perhaps the maintainers of this repository have recommendations on how to debug this.

I installed this role by doing the following:

requirements.yml

ansible-galaxy install -r requirements.yml

site.yml

- name: Install node_exporter on all bare-metal hosts
  hosts: bare-metal
  roles:
    - cloudalchemy.node-exporter

And then attempted to do a dry run with:

ansible-playbook -i ansible/hosts.yml --ask-become-pass --check ansible/site.yml

The output looks like this:

PLAY [Install node_exporter on all bare-metal hosts] ******************************************************************************************************************

TASK [Gathering Facts] ************************************************************************************************************************************************
ok: [nuc7i5b.brooks.network]
ok: [nuc7i5a.brooks.network]
ok: [pi.brooks.network]

TASK [cloudalchemy.node-exporter : Gather variables for each operating system] ****************************************************************************************
fatal: [nuc7i5a.brooks.network]: FAILED! => {"msg": "No file was found when using with_first_found. Use the 'skip: true' option to allow this task to be skipped if no files are found"}
fatal: [nuc7i5b.brooks.network]: FAILED! => {"msg": "No file was found when using with_first_found. Use the 'skip: true' option to allow this task to be skipped if no files are found"}
ok: [pi.brooks.network] => (item=/Users/brooks/.ansible/roles/cloudalchemy.node-exporter/vars/debian.yml)

TASK [cloudalchemy.node-exporter : Naive assertion of proper listen address] ******************************************************************************************
ok: [pi.brooks.network] => {
    "changed": false,
    "msg": "All assertions passed"
}

TASK [cloudalchemy.node-exporter : Fail on unsupported init systems] **************************************************************************************************
skipping: [pi.brooks.network]

TASK [cloudalchemy.node-exporter : Check collectors] ******************************************************************************************************************

TASK [cloudalchemy.node-exporter : Get checksum list from github] *****************************************************************************************************
objc[93522]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called.
objc[93522]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork()
child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.

And I get this from OSX:

screen shot 2018-09-17 at 4 00 02 pm

I'm running this from MacOS 10.13.6, with Python version 2.7.15 installed from brew.

IPv4 assumption in SELinux tasks

The following piece of code is actually broken when using IPv6 because it assumes an IPv4 address:

- name: Allow Node Exporter port in SELinux on RedHat OS family
seport:
ports: "{{ node_exporter_web_listen_address.split(':')[1] }}"
proto: tcp
setype: http_port_t
state: present
when:
- ansible_version.full is version_compare('2.4', '>=')
- ansible_selinux.status == "enabled"

Node exporter fails on ec2 builds with ""No file was found when using first_found"

#114 has renamed redhat.yml to redhat-7.yml.

This has broken Amazon ec2 installs with

{"msg": "No file was found when using first_found. Use the 'skip: true' option to allow this task to be skipped if no files are found"}

Amazon returns "ansible_os_family": "RedHat", and here is where I assume we used to match it:

- "{{ ansible_os_family | lower }}.yml"

Just needs a symlink to the old redhat.yml from redhat-7.yml I reckon (or 8 - I'm not sure).

Checksum download sometimes fails (status: 400)

Replacing lookup() with uri:

    - name: Get checksum list from github
      uri:
        url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/sha256sums.txt"
        method: GET
        return_content: true
      register: _checksum_result
      until: _checksum_result.status == 200
      retries: 5

    - name: Set _checksums
      set_fact:
        _checksums: "{{ _checksum_result.stdout_lines }}"
      run_once: true

Yields the 400 โ€” but not sure where Authorization header is introduced?

<localhost> EXEC /bin/sh -c 'echo ~root && sleep 0'
--
2888 | <localhost> EXEC /bin/sh -c '( umask 77 && mkdir -p "` echo /root/.ansible/tmp `"&& mkdir /root/.ansible/tmp/ansible-tmp-1595343177.2519362-233-56170830947126 && echo ansible-tmp-1595343177.2519362-233-56170830947126="` echo /root/.ansible/tmp/ansible-tmp-1595343177.2519362-233-56170830947126 `" ) && sleep 0'
2889 | Using module file /usr/local/lib/python3.7/site-packages/ansible/modules/net_tools/basics/uri.py
2890 | Pipelining is enabled.
2891 | <localhost> EXEC /bin/sh -c '/usr/local/bin/python && sleep 0'
2892 | <localhost> EXEC /bin/sh -c 'rm -f -r /root/.ansible/tmp/ansible-tmp-1595343177.2519362-233-56170830947126/ > /dev/null 2>&1 && sleep 0'
2893 | FAILED - RETRYING: Get checksum list from github (1 retries left).Result was: {
2894 | "attempts": 5,
2895 | "changed": false,
2896 | "connection": "close",
2897 | "content": "<?xml version=\"1.0\" encoding=\"UTF-8\"?>\n<Error><Code>InvalidArgument</Code><Message>Only one auth mechanism allowed; only the X-Amz-Algorithm query parameter, Signature query string parameter or the Authorization header should be specified</Message><ArgumentName>Authorization</ArgumentName><ArgumentValue>Basic **redacted**</ArgumentValue><RequestId>C27DCB3881334C01</RequestId><HostId>FIqOQQeYf2vmrxT1CEeevSvccLgS7KYSLaduFK8FjCsbkHoUDMtcQfE5RLzPE8HWKvjJIj9ozzA=</HostId></Error>",
2898 | "content_type": "application/xml",
2899 | "date": "Tue, 21 Jul 2020 14:52:58 GMT",
2900 | "elapsed": 0,
2901 | "invocation": {
2902 | "module_args": {
2903 | "attributes": null,
2904 | "backup": null,
2905 | "body": null,
2906 | "body_format": "raw",
2907 | "client_cert": null,
2908 | "client_key": null,
2909 | "content": null,
2910 | "creates": null,
2911 | "delimiter": null,
2912 | "dest": null,
2913 | "directory_mode": null,
2914 | "follow": false,
2915 | "follow_redirects": "safe",
2916 | "force": false,
2917 | "force_basic_auth": false,
2918 | "group": null,
2919 | "headers": {},
2920 | "http_agent": "ansible-httpget",
2921 | "method": "GET",
2922 | "mode": null,
2923 | "owner": null,
2924 | "regexp": null,
2925 | "remote_src": null,
2926 | "removes": null,
2927 | "return_content": true,
2928 | "selevel": null,
2929 | "serole": null,
2930 | "setype": null,
2931 | "seuser": null,
2932 | "src": null,
2933 | "status_code": [
2934 | 200
2935 | ],
2936 | "timeout": 30,
2937 | "unix_socket": null,
2938 | "unsafe_writes": null,
2939 | "url": "https://github.com/prometheus/node_exporter/releases/download/v1.0.1/sha256sums.txt",
2940 | "url_password": null,
2941 | "url_username": null,
2942 | "use_proxy": true,
2943 | "validate_certs": true
2944 | }
2945 | },
2946 | "msg": "Status code was 400 and not [200]: HTTP Error 400: Bad Request",
2947 | "redirected": false,
2948 | "retries": 6,
2949 | "server": "AmazonS3",
2950 | "status": 400,
2951 | "transfer_encoding": "chunked",
2952 | "url": "https://github.com/prometheus/node_exporter/releases/download/v1.0.1/sha256sums.txt",
2953 | "x_amz_id_2": "FIqOQQeYf2vmrxT1CEeevSvccLgS7KYSLaduFK8FjCsbkHoUDMtcQfE5RLzPE8HWKvjJIj9ozzA=",
2954 | "x_amz_request_id": "C27DCB3881334C01"
2955 | }

Originally posted by @till in #165 (comment)

"Allow node_exporter port in SELinux on RedHat OS family" fails on Debian9 with SELinux enabled

What happened?

Seems this task is not working against a debian9 with SELinux enabled.

- name: Allow node_exporter port in SELinux on RedHat OS family
  seport:
    ports: "{{ node_exporter_web_listen_address.split(':')[-1] }}"
    proto: tcp
    setype: http_port_t
    state: present
  when:
    - ansible_version.full is version_compare('2.4', '>=')
    - ansible_selinux.status == "enabled"
TASK [cloudalchemy.node-exporter : Allow node_exporter port in SELinux on RedHat OS family] **************************************************************************************
Monday 28 December 2020  12:09:27 +0100 (0:00:01.755)       0:00:24.342 ******* 
An exception occurred during task execution. To see the full traceback, use -vvv. The error was: ImportError: No module named seobject
fatal: [debian8-server]: FAILED! => {"changed": false, "msg": "Failed to import the required Python library (policycoreutils-python) on debian8-server's Python /usr/bin/python. Please read module documentation and install in the appropriate location. If the required library is installed, but Ansible is using the wrong Python interpreter, please consult the documentation on ansible_python_interpreter"}

However the seport module is not tested against debian (https://docs.ansible.com/ansible/2.9/modules/seport_module.html#notes).

I'm not sure but probably the best idea is to disable the task by adding

- not ansible_distribution | lower == "debian"

If you agree I can do a PR.

Environment

  • Role version:

    cloudalchemy.node-exporter (0.22.0)

  • Ansible version information:

ansible 2.9.14
  config file = /etc/ansible/ansible.cfg
  configured module search path = [u'/home/my/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/lib/python2.7/dist-packages/ansible
  executable location = /usr/bin/ansible
  python version = 2.7.18 (default, Aug  4 2020, 11:16:42) [GCC 9.3.0]
  • Target packages:
# dpkg -l | grep -i selinux
ii  checkpolicy                    2.6-2                             amd64        SELinux policy compiler
ii  libselinux1:amd64              2.6-3+b3                          amd64        SELinux runtime shared libraries
ii  libsemanage-common             2.6-2                             all          Common files for SELinux policy management libraries
ii  libsemanage1:amd64             2.6-2                             amd64        SELinux policy management library
ii  libsepol1:amd64                2.6-2                             amd64        SELinux library for manipulating binary security policies
ii  policycoreutils                2.6-3                             amd64        SELinux core policy utilities
ii  policycoreutils-dev            2.6-3                             amd64        SELinux core policy utilities (development utilities)
ii  policycoreutils-python-utils   2.6-3                             amd64        SELinux core policy utilities (Python utilities)
ii  python-selinux                 2.6-3+b3                          amd64        Python bindings to SELinux shared libraries
ii  python3-selinux                2.6-3+b3                          amd64        Python3 bindings to SELinux shared libraries
ii  python3-semanage               2.6-2                             amd64        Python3 bindings for SELinux policy management
ii  python3-sepolgen               2.6-3                             all          Python3 module used in SELinux policy generation
ii  python3-sepolicy               2.6-3                             amd64        Python binding for SELinux Policy Analyses
ii  selinux-basics                 0.5.6                             all          SELinux basic support
ii  selinux-policy-default         2:2.20161023.1-9                  all          Strict and Targeted variants of the SELinux policy
ii  selinux-policy-dev             2:2.20161023.1-9                  all          Headers from the SELinux reference policy for building modules
ii  selinux-utils                  2.6-3+b3                          amd64        SELinux utility programs

Unable to use role on MacOS - Python quit unexpectedly

What happened?

I'm trying to use the role to install node-exporter on a CentOS host from my MacOS machine. All other tasks from the playbook, involving various other roles such as docker work fine. The task "Get Checksum" fails with a strange error, see below. In parallel, a system dialogue shows up mentioning that Python quit unexpectedly.

Did you expect to see some different?

I'd expect the role to work :)

How to reproduce it (as minimally and precisely as possible):

I'm just using the following code snipped in my playbook:

- hosts: prometheus_monitoring
  remote_user: ansible
  become: yes
  roles:
    - role: cloudalchemy.node-exporter
  vars:
    node_exporter_basic_auth_users:
      my_username: "{{ node_exporter_basic_auth_password }}"

Environment

ansible-playbook --version
WARNING: Executing a script that is loading libcrypto in an unsafe way. This will fail in a future version of macOS. Set the LIBRESSL_REDIRECT_STUB_ABORT=1 in the environment to force this into an error.
ansible-playbook 2.9.7
  config file = /Users/dherrman/AnsibleProjects/makerspace-ansible/ansible.cfg
  configured module search path = [u'/Users/dherrman/.ansible/plugins/modules', u'/usr/share/ansible/plugins/modules']
  ansible python module location = /Library/Python/2.7/site-packages/ansible
  executable location = /usr/local/bin/ansible-playbook
  python version = 2.7.16 (default, Feb 29 2020, 01:55:37) [GCC 4.2.1 Compatible Apple LLVM 11.0.3 (clang-1103.0.29.20) (-macos10.15-objc-
  • Role version:

    0.21.0

  • Ansible version information:

    See environment

  • Variables:

None relevant to the issue I believe

  • Ansible playbook execution Logs:
TASK [cloudalchemy.node-exporter : Get checksum list from github] **************
task path: /Users/<username>/AnsibleProjects/<project>-ansible/roles/cloudalchemy.node-exporter/tasks/preflight.yml:99
ERROR! A worker was found in a dead state

Anything else we need to know?:

Permission denied to fs

Currently when running node_exporter as non-root user it cannot access some files (for example some filesystems, like /var/lib/docker). Adding cap_dac_read_search=+ep to binary file should solve this problem.

Node_exporter is not running with fresh install on Debian 9

I used this role to install and run node_exporter on a fresh new VM, everything is installed but the service does not run by default.

Setup

ansible 2.7.7
cloudalchemy.node-exporter 0.12.0

Operating System Debian 9

What it does

It install the node-exporter, the service and all of what is needed for it to run BUT does not start the service.

What it should do

Install everything and start the service.

Step to reproduce

Build a fresh instance of Debian 9 and run the playbook with default values on it.

Textfile directory option missing from service unit file when set to default

What happened?
As per this part of the README
image
the textfile directory should default to /var/lib/node_exporter.
However, when running the following playbook:

 - hosts: ubuntu                                                                                                                       
   vars:                                                                                                                               
     node_exporter_port: 9100                                                                                                          
     node_exporter_web_listen_address: "0.0.0.0:{{ node_exporter_port }}"                                                              
     node_exporter_enabled_collectors: [textfile]                                                                                      
   tasks:                                                                                                                              
     - name: Set up node exporter                                                                                                      
       import_role:                                                                                                                    
         name: cloudalchemy.node-exporter                                                                                              
       tags: node_exporter                                                                                                             

the systemd unit file ends up looking like this:

#
# Ansible managed
#

[Unit]
Description=Prometheus Node Exporter
After=network-online.target

[Service]
Type=simple
User=node-exp
Group=node-exp
ExecStart=/usr/local/bin/node_exporter \
    --collector.textfile \
    --web.listen-address=0.0.0.0:9100

SyslogIdentifier=node_exporter
Restart=always

PrivateTmp=yes
ProtectHome=yes
NoNewPrivileges=yes

ProtectSystem=strict
ProtectControlGroups=true
ProtectKernelModules=true
ProtectKernelTunables=yes

[Install]
WantedBy=multi-user.target

which is missing the --collector.textfile.directory option, making Node Exporter default to "", which
is not what the README of this role specifies.

Did you expect to see some different?
Yes. I expected the Node Exporter to be set up to have textfile look in /var/lib/node_exporter.
How to reproduce it (as minimally and precisely as possible):
See above.
Environment

  • Role version:
- src: cloudalchemy.node-exporter
  version: 0.14.0
  • Ansible version information:

    ansible --version

ansible 2.9.13
...
  python version = 3.7.6 | packaged by conda-forge | (default, Mar 23 2020, 23:03:20) [GCC 7.3.0]
  • Variables:

See above

Add Support for over proxy installation

What happened?
Cannot fetch binaries over proxy

How to reproduce it (as minimally and precisely as possible):
We would like to export a proxy to the machine and use that proxy
export https_proxy=https://proxy-server:8080; ansible-playbook -i hosts playbook.yml

Environment

Linux

  • Ansible version information:

ansible 2.8.0

  • Ansible playbook execution Logs:
fatal: [host01]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Failed lookup url for https://github.com/prometheus/node_exporter/releases/download/v0.18.1/sha256sums.txt : <urlopen error [Errno 111] Connection refused>"}

Anything else we need to know?:
Please add

validate_certs: False

In here
https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/preflight.yml#L78
https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/install.yml#L23

Same as
https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/preflight.yml#L58

Pass arbitrary command line flags to node_exporter

What is missing?
A way to pass arbitrary command line flags to node_exporter

Why do we need it?
node_exporter has more parameters than those exposed by the variables in this role. An "escape hatch" to pass any flags would be very useful. Something like:

node_exporter_flags: ["--web.telemetry-path=/foo", "--web.max-requests=99"]

In fact, with this variable in place, node_exporter_web_listen_address could be deprecated IMO as it would be adequately covered by the more general node_exporter_flags.

validate the SSL certificate failed from github

Hi,

I got the error message when try to run the ansible playbook.

ansible-node-exporter version: 0.15.0
to be installed node-exporter version: 0.18.1

https://github.com/cloudalchemy/ansible-node-exporter/blob/master/tasks/preflight.yml#L71

TASK [node-exporter : Get checksum list from github] ***********************************************************************************************************************************
fatal: [host]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Error validating the server's certificate for https://github.com/prometheus/node_exporter/releases/download/v0.18.1/sha256sums.txt: Failed to validate the SSL certificate for github.com:443. Make sure your managed systems have a valid CA certificate installed. You can use validate_certs=False if you do not need to confirm the servers identity but this is unsafe and not recommended. Paths checked for this platform: /etc/ssl/certs, /etc/ansible, /usr/local/etc/openssl. The exception msg was: [SSL: CERTIFICATE_VERIFY_FAILED] certificate verify failed (_ssl.c:727)."}

Could you help have a look?

Thanks,

CentOS7.6 failed to start node_exporter: cannot add dependency job for unit node_exporter.service

OS version

$ cat /etc/redhat-release CentOS Linux release 7.6.1810 (Core)

node_exporter unit file

$ cat /etc/systemd/system/node_exporter.service
#
# DO NOT EDIT THIS FILE It is automatically generated by Ansible.
#

[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
Type=simple
User=prometheus
Group=prometheus
ExecStart=/data/application/node_exporter/node_exporter \
    --log.level=error \
    --web.telemetry-path=/metrics \
    --collector.tcpstat \
    --collector.processes \
    --collector.netdev.ignored-devices=^(tap|cali|docker|veth|tun).*$ \
    --collector.filesystem.ignored-fs-types=^(autofs|binfmt_misc|bpf|cgroup2?|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|nsfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|selinuxfs|squashfs|sysfs|tracefs|rootfs|nfs)$ \
    --collector.filesystem.ignored-mount-points=^/(dev|proc|sys|boot|run.*|var/lib/kubelet/.+|var/lib/docker/.+|data/docker/overlay)($|/) \
    --collector.netstat.fields=^(.*_(InErrors|InErrs)|Ip_Forwarding|Ip(6|Ext)_(InOctets|OutOctets)|Icmp6?_(InMsgs|OutMsgs)|TcpExt_(Listen.*|Syncookies.*)|Tcp_(ActiveOpens|PassiveOpens|RetransSegs|CurrEstab)|Udp6?_(InDatagrams|OutDatagrams|NoPorts))$ \
    --collector.diskstats.ignored-devices=^(ram|loop|fd|nvme\d+n\d+p|tmpfs|md|up-|sr|rootfs)(\d*)$ \
    --collector.netclass.ignored-devices=^(tap|cali|docker|veth|tun).*$ \
    --collector.textfile.directory=/data/application/node_exporter/text_metrics \
    --no-collector.mdadm \
    --web.listen-address=0.0.0.0:9100

SyslogIdentifier=node_exporter
Restart=always

PrivateTmp=yes
ProtectHome=yes
NoNewPrivileges=yes

ProtectSystem=full
Nice=0

[Install]
WantedBy=multi-user.target

systemctl status

$ systemctl status node_exporter.service -l
โ— node_exporter.service - Prometheus Node Exporter
   Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: disabled)
   Active: inactive (dead)

2ๆœˆ 25 16:56:33 systemd[1]: Cannot add dependency job for unit node_exporter.service, ignoring: Unit not found.

Dash ('-') in collection name seems not supported

The dash ('-') seems not allowed anymore in collection names. It seems it was previously allowed:

โฏ ansible-galaxy collection install cloudalchemy.node-exporter
Process install dependency map
ERROR! Invalid collection name 'cloudalchemy.node-exporter', name must be in the format <namespace>.<collection>. Please make sure namespace and collection name contains characters from [a-zA-Z0-9_] only.

Tested with ansible version 2.9.16, both setting python_interpreter to python2 and python3

Adding option to configure collectors

It would be nice to have an option to configure collectors.

My proposal is to depend on collectors enabled by default in node_exporter and change var node_exporter_enabled_collectors into node_exporter_enabled_collectors which would have values like:

node_exporter_enabled_collectors:
- timex
- filesystem:
    option_a: value
    option_b: val

Disable collectors

TASK [cloudalchemy.node-exporter : Assert collectors are not both disabled and enabled at the same time] ***
failed: [host] (item=diskstats) => {
    "assertion": "item in node_exporter_enabled_collectors",
    "changed": false,
    "evaluated_to": false,
    "item": "diskstats",
    "msg": "Assertion failed"
}
failed: [host] (item=mdadm) => {
    "assertion": "item in node_exporter_enabled_collectors",
    "changed": false,
    "evaluated_to": false,
    "item": "mdadm",
    "msg": "Assertion failed"
}

I recently updated the role locally and it started failing with the above message. I think this check was introduced in #22. How do you disable what is enabled by this role (by default)? So far, I was using the role and disabled the collectors that I didn't need and it worked โ€” node-exporter didn't use the collector for e.g. mdam or diskstats.

With #22, my deploy stalls as the collector is enabled by default, but I can't disable it anymore.

Do I need to now explicitly set all enabled and disabled?

FreeBSD support

Hi,
would you accept PRs to add FreeBSD support to your role?
thanks!

Migration 0.15 to 0.19 fails

What happened?
Upgraded the role from 0.15 to 0.19. Role asserts that node-exporter is already installed because /usr/local/bin/node_exporter exists and therefore skips the install.yml tasks. The user is now different from what it was before and fails creating/chowning the textfile collector dir because the new user/group does not exist.

TASK [node-exporter : Create textfile collector dir] ********
fatal: [focal-dev]: FAILED! => changed=false
  gid: 998
  group: _node-exporter
  mode: '0775'
  msg: 'chown failed: failed to look up user node-exp'
  owner: _node-exporter
  path: /var/run/node_exporter
  size: 40
  state: directory
  uid: 997

Did you expect to see some different?

How to reproduce it (as minimally and precisely as possible):
Run the role with version 0.19 on hosts where node-exporter was previously installed using version 0.18 or older.

Environment

  • Role version:

    - cloudalchemy.node-exporter, 0.19.0

  • Ansible version information:

ansible 2.8.5
  config file = /var/lib/ansible/ansible.cfg
  configured module search path = ['/home/vos/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules']
  ansible python module location = /usr/local/lib/python3.7/dist-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 3.7.3 (default, Apr  3 2019, 05:39:12) [GCC 8.3.0]
  • Variables:
    My variables unchanged from 0.15:
---

node_exporter_version: 0.18.1
node_exporter_system_group: "_node-exporter"

node_exporter_textfile_dir: "/var/run/node_exporter"

node_exporter_enabled_collectors:
  - textfile:
      directory: "{{ node_exporter_textfile_dir }}"

node_exporter_disabled_collectors:
  - systemd
  • Ansible playbook execution Logs:
PLAY [Manage Node Exporter] ***************************************************************************************************************************
TASK [Gathering Facts] ********************************************************************************************************************************
ok: [focal-dev]

TASK [node-exporter : Assert usage of systemd as an init system] **************************************************************************************
ok: [focal-dev] => changed=false
  msg: All assertions passed

TASK [node-exporter : Get systemd version] ************************************************************************************************************
ok: [focal-dev]

TASK [node-exporter : Set systemd version fact] *******************************************************************************************************
ok: [focal-dev]

TASK [node-exporter : Naive assertion of proper listen address] ***************************************************************************************
ok: [focal-dev] => changed=false
  msg: All assertions passed

TASK [node-exporter : Assert collectors are not both disabled and enabled at the same time] ***********************************************************
ok: [focal-dev] => (item=systemd) => changed=false
  ansible_loop_var: item
  item: systemd
  msg: All assertions passed

TASK [node-exporter : Check if node_exporter is installed] ********************************************************************************************
ok: [focal-dev]

TASK [node-exporter : Gather currently installed node_exporter version (if any)] **********************************************************************
ok: [focal-dev]

TASK [node-exporter : Get checksum list from github] **************************************************************************************************
ok: [focal-dev]

TASK [node-exporter : Get checksum for amd64 architecture] ********************************************************************************************
ok: [focal-dev] => (item=b2503fd932f85f4e5baf161268854bf5d22001869b84f00fd2d1f57b51b72424  node_exporter-0.18.1.linux-amd64.tar.gz)

TASK [node-exporter : Copy the Node Exporter systemd service file] ************************************************************************************
ok: [focal-dev]

TASK [node-exporter : Create textfile collector dir] **************************************************************************************************
fatal: [focal-dev]: FAILED! => changed=false
  gid: 998
  group: _node-exporter
  mode: '0775'
  msg: 'chown failed: failed to look up user node-exp'
  owner: _node-exporter
  path: /var/run/node_exporter
  size: 40
  state: directory
  uid: 997

PLAY RECAP ********************************************************************************************************************************************
focal-dev                  : ok=11   changed=0    unreachable=0    failed=1    skipped=11   rescued=0    ignored=0

Anything else we need to know?:
I can work around this by removing /usr/local/bin/node_exporter on all my hosts so that the install.yml gets called but that is not a "clean" upgrade path.

node-exporter service handler is triggered before deployment

Hello,
maybe I am wrong but shouldn't your role first deploy the node exporter and then restart the service via a handler?

PLAY [Deploy node_exporter] ********************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************************
ok: [192.168.33.11]
ok: [192.168.33.10]

TASK [cloudalchemy.node-exporter : check collectors] *******************************************************************************************************

TASK [cloudalchemy.node-exporter : Get checksum for amd64 architecture] ************************************************************************************
skipping: [192.168.33.10] => (item=747ee549c1010947a8b162b1434976fe6cb8445540521d2fcc283765e4be1a79  node_exporter-0.16.0.darwin-386.tar.gz) 
skipping: [192.168.33.10] => (item=73a8c451bd14dea587ebf2fd1258471fe97bddbae6f44b6a9d3ce7e2327bc91d  node_exporter-0.16.0.darwin-amd64.tar.gz) 
skipping: [192.168.33.10] => (item=2f18a32a7bb1c91307ed776cce50559bbcd66af90a61ea0a22a661ebe79e4fda  node_exporter-0.16.0.linux-386.tar.gz) 
skipping: [192.168.33.11] => (item=747ee549c1010947a8b162b1434976fe6cb8445540521d2fcc283765e4be1a79  node_exporter-0.16.0.darwin-386.tar.gz) 
skipping: [192.168.33.11] => (item=73a8c451bd14dea587ebf2fd1258471fe97bddbae6f44b6a9d3ce7e2327bc91d  node_exporter-0.16.0.darwin-amd64.tar.gz) 
skipping: [192.168.33.11] => (item=2f18a32a7bb1c91307ed776cce50559bbcd66af90a61ea0a22a661ebe79e4fda  node_exporter-0.16.0.linux-386.tar.gz) 
ok: [192.168.33.10] => (item=e92a601a5ef4f77cce967266b488a978711dabc527a720bea26505cba426c029  node_exporter-0.16.0.linux-amd64.tar.gz)
skipping: [192.168.33.10] => (item=c793e8278ec6a167a49518d72dd928361a045bd4c8b155a22d5b158dd3aea2ac  node_exporter-0.16.0.linux-arm64.tar.gz) 
skipping: [192.168.33.10] => (item=18c91a0247f4bc97fb7cdd96502cd8a804a96f42a16357b39f43e28b3d2ac864  node_exporter-0.16.0.linux-armv5.tar.gz) 
ok: [192.168.33.11] => (item=e92a601a5ef4f77cce967266b488a978711dabc527a720bea26505cba426c029  node_exporter-0.16.0.linux-amd64.tar.gz)
skipping: [192.168.33.10] => (item=f9518aea4fa7127122a6bf384ba8f70120deaaef75532749f1765cf6e25fd820  node_exporter-0.16.0.linux-armv6.tar.gz) 
skipping: [192.168.33.11] => (item=c793e8278ec6a167a49518d72dd928361a045bd4c8b155a22d5b158dd3aea2ac  node_exporter-0.16.0.linux-arm64.tar.gz) 
skipping: [192.168.33.10] => (item=b8bf44c025ec2c5210bdda185f8e72b29ccd3eb9be339b8dbf96835d4fc1965d  node_exporter-0.16.0.linux-armv7.tar.gz) 
skipping: [192.168.33.11] => (item=18c91a0247f4bc97fb7cdd96502cd8a804a96f42a16357b39f43e28b3d2ac864  node_exporter-0.16.0.linux-armv5.tar.gz) 
skipping: [192.168.33.10] => (item=e0561e421deb02f343e2dd5a75ad322bf6960de56c0fa965d9708f6b237f02b0  node_exporter-0.16.0.netbsd-386.tar.gz) 
skipping: [192.168.33.11] => (item=f9518aea4fa7127122a6bf384ba8f70120deaaef75532749f1765cf6e25fd820  node_exporter-0.16.0.linux-armv6.tar.gz) 
skipping: [192.168.33.10] => (item=293451f83ace3f25e36466fe34024827ac03dee6bf3c3694efdbc0c732959033  node_exporter-0.16.0.netbsd-amd64.tar.gz) 
skipping: [192.168.33.11] => (item=b8bf44c025ec2c5210bdda185f8e72b29ccd3eb9be339b8dbf96835d4fc1965d  node_exporter-0.16.0.linux-armv7.tar.gz) 
skipping: [192.168.33.11] => (item=e0561e421deb02f343e2dd5a75ad322bf6960de56c0fa965d9708f6b237f02b0  node_exporter-0.16.0.netbsd-386.tar.gz) 
skipping: [192.168.33.11] => (item=293451f83ace3f25e36466fe34024827ac03dee6bf3c3694efdbc0c732959033  node_exporter-0.16.0.netbsd-amd64.tar.gz) 

TASK [cloudalchemy.node-exporter : Create the Node Exporter group] *****************************************************************************************
changed: [192.168.33.11]
changed: [192.168.33.10]

TASK [cloudalchemy.node-exporter : Create the Node Exporter user] ******************************************************************************************
changed: [192.168.33.11]
changed: [192.168.33.10]

TASK [cloudalchemy.node-exporter : Download node_exporter binary to local folder] **************************************************************************
ok: [192.168.33.10 -> localhost]
ok: [192.168.33.11 -> localhost]

TASK [cloudalchemy.node-exporter : Unpack node_exporter binary] ********************************************************************************************
skipping: [192.168.33.10]
skipping: [192.168.33.11]

TASK [cloudalchemy.node-exporter : Propagate Node Exporter binaries] ***************************************************************************************
changed: [192.168.33.10]
changed: [192.168.33.11]

TASK [cloudalchemy.node-exporter : Create texfile collector dir] *******************************************************************************************
changed: [192.168.33.11]
changed: [192.168.33.10]

TASK [cloudalchemy.node-exporter : Install libcap on Debian systems] ***************************************************************************************
ok: [192.168.33.11]
ok: [192.168.33.10]

TASK [cloudalchemy.node-exporter : Node exporter can read anything (omit file permissions)] ****************************************************************
changed: [192.168.33.11]
changed: [192.168.33.10]

TASK [cloudalchemy.node-exporter : Copy the Node Exporter systemd service file] ****************************************************************************
changed: [192.168.33.10]
changed: [192.168.33.11]

TASK [cloudalchemy.node-exporter : Install dependencies on RedHat OS family] *******************************************************************************
skipping: [192.168.33.10] => (item=libselinux-python) 
skipping: [192.168.33.10] => (item=policycoreutils-python) 
skipping: [192.168.33.11] => (item=libselinux-python) 
skipping: [192.168.33.11] => (item=policycoreutils-python) 

TASK [cloudalchemy.node-exporter : Allow Node Exporter port in SELinux on RedHat OS family] ****************************************************************
skipping: [192.168.33.10]
skipping: [192.168.33.11]

TASK [cloudalchemy.node-exporter : Ensure Node Exporter is enabled on boot] ********************************************************************************
changed: [192.168.33.11]
changed: [192.168.33.10]

RUNNING HANDLER [cloudalchemy.node-exporter : restart node exporter] ***************************************************************************************
changed: [192.168.33.10]
changed: [192.168.33.11]

PLAY [Deploy blackbox_exporter] ****************************************************************************************************************************

TASK [Gathering Facts] *************************************************************************************************************************************
ok: [192.168.33.11]

TASK [cloudalchemy.blackbox-exporter : create blackbox_exporter system group] ******************************************************************************
changed: [192.168.33.11]

TASK [cloudalchemy.blackbox-exporter : create blackbox_exporter system user] *******************************************************************************
changed: [192.168.33.11]

TASK [cloudalchemy.blackbox-exporter : create blackbox_exporter directories] *******************************************************************************
changed: [192.168.33.11]

TASK [cloudalchemy.blackbox-exporter : download blackbox exporter binary to local folder] ******************************************************************
skipping: [192.168.33.11]

TASK [cloudalchemy.blackbox-exporter : propagate blackbox exporter binary] *********************************************************************************
changed: [192.168.33.11]

TASK [cloudalchemy.blackbox-exporter : Install libcap on Debian systems] ***********************************************************************************
ok: [192.168.33.11]

TASK [cloudalchemy.blackbox-exporter : Ensure blackbox exporter binary has cap_net_raw capability] *********************************************************
changed: [192.168.33.11]

TASK [cloudalchemy.blackbox-exporter : create systemd service unit] ****************************************************************************************
changed: [192.168.33.11]

TASK [cloudalchemy.blackbox-exporter : configure blackbox exporter] ****************************************************************************************
changed: [192.168.33.11]

TASK [cloudalchemy.blackbox-exporter : ensure blackbox_exporter service is enabled] ************************************************************************
changed: [192.168.33.11]

RUNNING HANDLER [cloudalchemy.blackbox-exporter : restart blackbox exporter] *******************************************************************************
changed: [192.168.33.11]

RUNNING HANDLER [cloudalchemy.blackbox-exporter : reload blackbox exporter] ********************************************************************************
changed: [192.168.33.11]

Checksum always failing

What happened?
Installed on my arm64 instance, using ansible 2.7.10, no problems.
installed on my amd64 instance, using ansible 2.7.18, fails to find the correct checksum.

with logging:

TASK [node-exporter : Get checksum list from github] **************************************************************************************************************************************************************************************************************************************************************************
task path: /home/demo/zinfra/cailleach/environments/avs-test/ansible/.galaxy/node-exporter/tasks/preflight.yml:99
ok: [avs-test-sft01 -> localhost] => {
    "ansible_facts": {
        "_checksums": "['eb7feb537a96d518644879f617eaef2c28e9af5878c671c0ba0af11d2c27c791  node_exporter-1.0.1.darwin-386.tar.gz', 'e51d39ef14f5c6accee158e94b5e324fa6eb647444234a4be3491fbc3983df47  node_exporter-1.0.1.darwin-amd64.tar.gz', '734e036a849152b185da2080eb8656c36cde862095a464cb17705ca723ea3929  node_exporter-1.0.1.linux-386.tar.gz', '3369b76cd2b0ba678b6d618deab320e565c3d93ccb5c2a0d5db51a53857768ae  node_exporter-1.0.1.linux-amd64.tar.gz', '017514906922fcc4b7d727655690787faed0562bc7a17aa9f72b0651cb1b47fb  node_exporter-1.0.1.linux-arm64.tar.gz', '38413100bfb935c59aea088a0af792134b75972eb90ab2bc6cf1c09ad3b08aea  node_exporter-1.0.1.linux-armv5.tar.gz', 'c1d7affbc7762c478c169830c43b4c6177a761bf1d2dd715dbffa55ca772655a  node_exporter-1.0.1.linux-armv6.tar.gz', 'e7f4427a25f1870103588e4968c7dc8c1426c00a0c029d0183a9a7afdd61357b  node_exporter-1.0.1.linux-armv7.tar.gz', '43335ccab5728b3c61ea7a0977143719c392ce13a90fa0d14169b5c10e8babd0  node_exporter-1.0.1.linux-mips.tar.gz', 'c0109f2f76628d2e25ea78e39d4b95100079ee859863be1471519b5e85a2fe78  node_exporter-1.0.1.linux-mips64.tar.gz', 'bcba02058b9ce171b5c3b077f78f371eb7685239f113200d15787c55fb204857  node_exporter-1.0.1.linux-mips64le.tar.gz', '85f0a24c07c5d8237caf36a5c68a63958280dab802b5056ff36d75563d5e5241  node_exporter-1.0.1.linux-mipsle.tar.gz', '43aa5e72f5068d16eb8d33f6b729186bf558d40ec0c734746b40a16902864808  node_exporter-1.0.1.linux-ppc64.tar.gz', '5ae6c772108c877038cd66a761e4ad93edcc8c446120478499412b24e7953146  node_exporter-1.0.1.linux-ppc64le.tar.gz', '2f22d1ce18969017fb32dbd285a264adf3da6252eec05f03f105cf638ec0bb06  node_exporter-1.0.1.linux-s390x.tar.gz', '7766d78638c2f84d1084a79d8cb5d8f036b7ce375390870d5e709673118d1260  node_exporter-1.0.1.netbsd-386.tar.gz', '41cc54f77f860ed19a7b74f132269f810e3c01fbac5320c3fa2e244fa2247d56  node_exporter-1.0.1.netbsd-amd64.tar.gz']"
    },
    "changed": false
}

TASK [node-exporter : Get checksum for amd64 architecture] ********************************************************************************************************************************************************************************************************************************************************************
task path: /home/demo/zinfra/cailleach/environments/avs-test/ansible/.galaxy/node-exporter/tasks/preflight.yml:104
ok: [avs-test-sft01 -> localhost] => (item=['eb7feb537a96d518644879f617eaef2c28e9af5878c671c0ba0af11d2c27c791  node_exporter-1.0.1.darwin-386.tar.gz', 'e51d39ef14f5c6accee158e94b5e324fa6eb647444234a4be3491fbc3983df47  node_exporter-1.0.1.darwin-amd64.tar.gz', '734e036a849152b185da2080eb8656c36cde862095a464cb17705ca723ea3929  node_exporter-1.0.1.linux-386.tar.gz', '3369b76cd2b0ba678b6d618deab320e565c3d93ccb5c2a0d5db51a53857768ae  node_exporter-1.0.1.linux-amd64.tar.gz', '017514906922fcc4b7d727655690787faed0562bc7a17aa9f72b0651cb1b47fb  node_exporter-1.0.1.linux-arm64.tar.gz', '38413100bfb935c59aea088a0af792134b75972eb90ab2bc6cf1c09ad3b08aea  node_exporter-1.0.1.linux-armv5.tar.gz', 'c1d7affbc7762c478c169830c43b4c6177a761bf1d2dd715dbffa55ca772655a  node_exporter-1.0.1.linux-armv6.tar.gz', 'e7f4427a25f1870103588e4968c7dc8c1426c00a0c029d0183a9a7afdd61357b  node_exporter-1.0.1.linux-armv7.tar.gz', '43335ccab5728b3c61ea7a0977143719c392ce13a90fa0d14169b5c10e8babd0  node_exporter-1.0.1.linux-mips.tar.gz', 'c0109f2f76628d2e25ea78e39d4b95100079ee859863be1471519b5e85a2fe78  node_exporter-1.0.1.linux-mips64.tar.gz', 'bcba02058b9ce171b5c3b077f78f371eb7685239f113200d15787c55fb204857  node_exporter-1.0.1.linux-mips64le.tar.gz', '85f0a24c07c5d8237caf36a5c68a63958280dab802b5056ff36d75563d5e5241  node_exporter-1.0.1.linux-mipsle.tar.gz', '43aa5e72f5068d16eb8d33f6b729186bf558d40ec0c734746b40a16902864808  node_exporter-1.0.1.linux-ppc64.tar.gz', '5ae6c772108c877038cd66a761e4ad93edcc8c446120478499412b24e7953146  node_exporter-1.0.1.linux-ppc64le.tar.gz', '2f22d1ce18969017fb32dbd285a264adf3da6252eec05f03f105cf638ec0bb06  node_exporter-1.0.1.linux-s390x.tar.gz', '7766d78638c2f84d1084a79d8cb5d8f036b7ce375390870d5e709673118d1260  node_exporter-1.0.1.netbsd-386.tar.gz', '41cc54f77f860ed19a7b74f132269f810e3c01fbac5320c3fa2e244fa2247d56  node_exporter-1.0.1.netbsd-amd64.tar.gz']) => {
    "ansible_facts": {
        "node_exporter_checksum": "['eb7feb537a96d518644879f617eaef2c28e9af5878c671c0ba0af11d2c27c791"
    },
    "changed": false,
    "item": "['eb7feb537a96d518644879f617eaef2c28e9af5878c671c0ba0af11d2c27c791  node_exporter-1.0.1.darwin-386.tar.gz', 'e51d39ef14f5c6accee158e94b5e324fa6eb647444234a4be3491fbc3983df47  node_exporter-1.0.1.darwin-amd64.tar.gz', '734e036a849152b185da2080eb8656c36cde862095a464cb17705ca723ea3929  node_exporter-1.0.1.linux-386.tar.gz', '3369b76cd2b0ba678b6d618deab320e565c3d93ccb5c2a0d5db51a53857768ae  node_exporter-1.0.1.linux-amd64.tar.gz', '017514906922fcc4b7d727655690787faed0562bc7a17aa9f72b0651cb1b47fb  node_exporter-1.0.1.linux-arm64.tar.gz', '38413100bfb935c59aea088a0af792134b75972eb90ab2bc6cf1c09ad3b08aea  node_exporter-1.0.1.linux-armv5.tar.gz', 'c1d7affbc7762c478c169830c43b4c6177a761bf1d2dd715dbffa55ca772655a  node_exporter-1.0.1.linux-armv6.tar.gz', 'e7f4427a25f1870103588e4968c7dc8c1426c00a0c029d0183a9a7afdd61357b  node_exporter-1.0.1.linux-armv7.tar.gz', '43335ccab5728b3c61ea7a0977143719c392ce13a90fa0d14169b5c10e8babd0  node_exporter-1.0.1.linux-mips.tar.gz', 'c0109f2f76628d2e25ea78e39d4b95100079ee859863be1471519b5e85a2fe78  node_exporter-1.0.1.linux-mips64.tar.gz', 'bcba02058b9ce171b5c3b077f78f371eb7685239f113200d15787c55fb204857  node_exporter-1.0.1.linux-mips64le.tar.gz', '85f0a24c07c5d8237caf36a5c68a63958280dab802b5056ff36d75563d5e5241  node_exporter-1.0.1.linux-mipsle.tar.gz', '43aa5e72f5068d16eb8d33f6b729186bf558d40ec0c734746b40a16902864808  node_exporter-1.0.1.linux-ppc64.tar.gz', '5ae6c772108c877038cd66a761e4ad93edcc8c446120478499412b24e7953146  node_exporter-1.0.1.linux-ppc64le.tar.gz', '2f22d1ce18969017fb32dbd285a264adf3da6252eec05f03f105cf638ec0bb06  node_exporter-1.0.1.linux-s390x.tar.gz', '7766d78638c2f84d1084a79d8cb5d8f036b7ce375390870d5e709673118d1260  node_exporter-1.0.1.netbsd-386.tar.gz', '41cc54f77f860ed19a7b74f132269f810e3c01fbac5320c3fa2e244fa2247d56  node_exporter-1.0.1.netbsd-amd64.tar.gz']"
}

Did you expect to see some different?
rather than finding "['eb7feb537a96d518644879f617eaef2c28e9af5878c671c0ba0af11d2c27c791", i expected node_exporter_checksum to be "3369b76cd2b0ba678b6d618deab320e565c3d93ccb5c2a0d5db51a53857768ae"

How to reproduce it (as minimally and precisely as possible):
Run with ansible 2.7.18, instead of 2.7.10

Environment

  • Role version:

    0.21.5

  • Ansible version information:

    ansible 2.7.18 config file = /etc/ansible/ansible.cfg configured module search path = ['/home/demo/.ansible/plugins/modules', '/usr/share/ansible/plugins/modules'] ansible python module location = /home/demo/zinfra/cailleach/third_party/.poetry/venvs/third-party-jL9HBFdt-py3.8/lib/python3.8/site-packages/ansible executable location = /home/demo/zinfra/cailleach/third_party/.poetry/venvs/third-party-jL9HBFdt-py3.8/bin/ansible python version = 3.8.4 (default, Jul 13 2020, 21:16:07) [GCC 9.3.0]

  • Variables:

insert role variables relevant to the issue
  • Ansible playbook execution Logs:
insert Ansible logs relevant to the issue here

Anything else we need to know?:

Rate limit errors due to checksum fetching

With a large number of hosts, the task "Get checksum for amd64 architecture" usually fails for at least some of them with the following error:

fatal: [HOST1]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/v0.16.0/sha256sums.txt : HTTP Error 429: Too Many Requests"}
fatal: [HOST2]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/v0.16.0/sha256sums.txt : HTTP Error 429: Too Many Requests"}
fatal: [HOST3]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/v0.16.0/sha256sums.txt : HTTP Error 429: Too Many Requests"}
fatal: [HOST4]: FAILED! => {"msg": "An unhandled exception occurred while running the lookup plugin 'url'. Error was a <class 'ansible.errors.AnsibleError'>, original message: Received HTTP error for https://github.com/prometheus/node_exporter/releases/download/v0.16.0/sha256sums.txt : HTTP Error 429: Too Many Requests"}

This appears to be because the checksum is fetched for each node (which makes sense, they don't all share the architecture), but is fetched by the controller node, so for 40 amd64 nodes, the controller will fetch the amd64 checksums 40 times.

For now I have locally set run_once: true in ~/.ansible/...`, but I think this needs a proper solution.

Use a github token for checksums?

What is missing?

I think the role should support the use of a Github Token to avoid rate limiting.

Currently, I keep running into a 400 (Bad Request) errors when I deploy lots of nodes in parallel. That seems to rate-limiting from Github's end. In my environment, I cd to about 20 customer setups, each of them may have multiple nodes. The part we are continuously deploying involves a basic monitoring setup on each node (including the node-exporter).

Why do we need it?

We run Ansible in a container, shared-nothing so to speak. The container gets invoked during CI (merge to main branch).

There is no caching between builds, it seems to work well. There are no side effects, except for this part where each run makes requests against Github (API) resources and seems to run into the rate-limit eventually.

I was digging around, it seems that both of these blocks happen every time (unless I download the binary myself):

Checksum gathering:

- block:
- name: Get checksum list from github
set_fact:
_checksums: "{{ lookup('url', 'https://github.com/prometheus/node_exporter/releases/download/v' + node_exporter_version + '/sha256sums.txt', wantlist=True) | list }}"
run_once: true
- name: "Get checksum for {{ go_arch }} architecture"
set_fact:
node_exporter_checksum: "{{ item.split(' ')[0] }}"
with_items: "{{ _checksums }}"
when:
- "('linux-' + go_arch + '.tar.gz') in item"
when: node_exporter_binary_local_dir | length == 0

Downloading:

- block:
- name: Download node_exporter binary to local folder
become: false
get_url:
url: "https://github.com/prometheus/node_exporter/releases/download/v{{ node_exporter_version }}/node_exporter-{{ node_exporter_version }}.linux-{{ go_arch }}.tar.gz"
dest: "/tmp/node_exporter-{{ node_exporter_version }}.linux-{{ go_arch }}.tar.gz"
checksum: "sha256:{{ node_exporter_checksum }}"
register: _download_binary
until: _download_binary is succeeded
retries: 5
delay: 2
delegate_to: localhost
check_mode: false
- name: Unpack node_exporter binary
become: false
unarchive:
src: "/tmp/node_exporter-{{ node_exporter_version }}.linux-{{ go_arch }}.tar.gz"
dest: "/tmp"
creates: "/tmp/node_exporter-{{ node_exporter_version }}.linux-{{ go_arch }}/node_exporter"
delegate_to: localhost
check_mode: false
- name: Propagate node_exporter binaries
copy:
src: "/tmp/node_exporter-{{ node_exporter_version }}.linux-{{ go_arch }}/node_exporter"
dest: "{{ _node_exporter_binary_install_dir }}/node_exporter"
mode: 0755
owner: root
group: root
notify: restart node_exporter
when: not ansible_check_mode
when: node_exporter_binary_local_dir | length == 0

I would think I wouldn't download anything โ€” unless I really needed? Do you have any thoughts on changing that?

Environment

  • Role version:

    0.21.3

  • Ansible version information:

root@5c649b77871a:/ansible-all-the-things# ansible --version
ansible 2.9.8
  config file = /ansible-all-the-things/ansible.cfg
  configured module search path = ['/ansible-all-the-things/library']
  ansible python module location = /usr/local/lib/python3.7/site-packages/ansible
  executable location = /usr/local/bin/ansible
  python version = 3.7.8 (default, Jun 30 2020, 18:36:05) [GCC 8.3.0]

Anything else we need to know?:

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.