linux-system-roles / metrics Goto Github PK
View Code? Open in Web Editor NEWAn ansible role which configures metrics collection.
Home Page: https://linux-system-roles.github.io/metrics/
License: MIT License
An ansible role which configures metrics collection.
Home Page: https://linux-system-roles.github.io/metrics/
License: MIT License
I tried this role on RHEL 8.3 and it works well, however few items could be improved for graph access:
Thanks.
I have the following playbook:
# SPDX-License-Identifier: MIT
---
- name: Check if pcp2elasticsearch has been deployed
hosts: all
roles:
- role: linux-system-roles.metrics
vars:
metrics_into_elasticsearch: yes
tasks:
- name: Check if pcp2elasticsearch is installed
command: test -x /usr/bin/pcp2elasticsearch
This playbook fails on the following distros:
The error message printed by Ansible is as follows:
fatal: [10.0.138.13]: FAILED! => {"changed": false, "msg": "Unable to start service pcp2elasticsearch: Job for pcp2elasticsearch.service failed because the control process exited with error code.\nSee \"systemctl status pcp2elasticsearch.service\" and \"journalctl -xe\" for details.\n"}
The journalctl -xe
shows as the real reason:
pcp2elasticsearch.service: Failed at step EXEC spawning /usr/bin/pmrepconf: No such file or directory
The pmrepconf
tool is used in unit file of the pcp2elasticsearch.service
service. The issue is that pmrepconf
tool has been introduced in pcp version 5.2.0, while all the distros mentioned above are using older pcp version where this tool is not available.
If this is intentional and the metrics_into_elasticsearch
functionality should work only with the pcp >= 5.2.0, let me know and I will modify my tests accordingly.
The metrics role uses the following set of sub-roles:
Unfortunately these sub-roles are not visible to Ansible in the default configuration.
Let's have a playbook /root/myplaybook.yml
in root's home directory:
# SPDX-License-Identifier: MIT
---
- name: My playbook
hosts: all
roles:
- role: linux-system-roles.metrics
vars:
metrics_from_elasticsearch: yes
When the playbook is run using this command ansible-playbook /root/myplaybook.yml
it fails with the following error:
TASK [Setup Elasticsearch metrics] ********************************************************
ERROR! the role 'performancecopilot_metrics_pcp' was not found in /root/roles:/root/.ansible/roles:/usr/share/ansible/roles:/etc/ansible/roles:/root
The error appears to be in '/usr/share/ansible/roles/metrics/roles/performancecopilot_metrics_elasticsearch/tasks/main.yml': line 24, column 11, but may
be elsewhere in the file depending on the exact syntax problem.
The offending line appears to be:
import_role:
name: performancecopilot_metrics_pcp
^ here
Note: a patch for issue #33 has been applied here as the original pcp
role does not exists at all
The issue is: In the default installation/configuration the metrics role can not find its own sub-roles.
TASK [Check if allowed users of bpftrace are configured] ***********************
task path: /tmp/tmpl79__1xz/tests/check_bpftrace.yml:6
fatal: [/cache/centos-8.qcow2]: FAILED! => {"changed": true, "cmd": "grep -w '^allowed_users' /var/lib/pcp/pmdas/bpftrace/bpftrace.conf | grep -wq 'pcptest'", "delta": "0:00:00.005343", "end": "2021-02-02 17:47:01.734173", "msg": "non-zero return code", "rc": 1, "start": "2021-02-02 17:47:01.728830", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
After running the role, logging in to Grafana (thus changing the password), the role fails:
TASK [rhel-system-roles.metrics : Ensure graphing service runtime settings are configured] ************************************************************************************************************************
fatal: [localhost]: FAILED! => {"cache_control": "no-cache", "changed": false, "connection": "close", "content": "{"message":"Invalid username or password"}", "content_length": "42", "content_type": "application/json; charset=UTF-8", "date": "Mon, 16 Nov 2020 09:33:25 GMT", "elapsed": 0, "expires": "-1", "json": {"message": "Invalid username or password"}, "msg": "Status code was 401 and not [200]: HTTP Error 401: Unauthorized", "pragma": "no-cache", "redirected": false, "status": 401, "url": "http://admin:admin@localhost:3000/api/plugins/performancecopilot-pcp-app/settings", "x_frame_options": "deny"}
While the check is probably helpful initially I think the role should handle somehow the case where Grafana has been used already to allow rerunning the role. Thanks.
When running the Metrics role against a CentOS Stream 9 system, the "Ensure performance metric collector authentication is configured" task fails with error:
fatal: [c9s-server1.example.com]: FAILED! => {"changed": false, "msg": "AnsibleUndefinedVariable: '__pcp_sasl_mechlist' is undefined"}
This is probably related to there being no metrics/roles/pcp/vars/CentOS_9.yml file to define this variable.
Here are the ansible_distribution variables gathered from my CentOS 9 Stream system:
"ansible_distribution": "CentOS",
"ansible_distribution_file_parsed": true,
"ansible_distribution_file_path": "/etc/redhat-release",
"ansible_distribution_file_variety": "RedHat",
"ansible_distribution_major_version": "9",
"ansible_distribution_release": "NA",
"ansible_distribution_version": "9",
Hi,
You added in your initial patch a rhel8.conf
[options]
version = 1
[rhel8-zeroconf]
interval = 1s
#proc metrics
proc.psinfo.cmd = ,
proc.psinfo.sname = ,
proc.psinfo.ppid = ,
What is it used for?
Is it required when installing pcp-zeroconf?
There is a typo in the README.md where keyword metrics_with_elasticsearch
should be replaced by keyword metrics_into_elasticsearch
.
The keyword metrics_with_elasticsearch
is not used in the metrics role, all tasks related to elasticsearch work with metrics_into_elasticsearch
keyword.
When metrics_from_mssql: yes
is set in a playbook, the role installs MSSQL agent. However registration of this agent in PMCD fails due to missing python3-pyodbc
package.
The fail of the registration is not recognized by the playbook run. The run of such playbook succeeds, but the MSSQL agent simply does not work due to the missing registration.
Note: This has been tested on Fedora-33 and RHEL-8.4-Development distros. As I do not have Debian or other distros available, I can not confirm this issue on those other distros.
The role uses a locally defined variable role_name
. Unfortunately a global variable of the same name (see Ansible doc) sometimes overrides the local value.
One example of such conflict is in generation of pcp2elasticsearch.service
file form roles/performancecopilot_metrics_elasticsearch/templates/pcp2elasticsearch.service.j2
template. The generated file looks i.e. like this
[Unit]
Description=pcp-to-elasticsearch metrics export service
Documentation=man:pcp2elasticsearch(1)
After=network-online.target pmcd.service
[Service]
TimeoutSec=10
ExecStartPre=/usr/bin/pmrepconf -c \
--option interval=60 \
--option es_index=pcp \
--option es_hostid= \
--option es_server=http://localhost:9200 \
--option es_search_type=pcp-/usr/share/ansible/roles/rhel-system-roles.metrics/roles/performancecopilot_metrics_elasticsearch \
/etc/pcp/pcp2elasticsearch.conf
ExecStart=/usr/bin/pcp2elasticsearch --include-labels :metrics
Restart=on-failure
[Install]
WantedBy=multi-user.target
As can be seen the value of the cmdline switch es_search_type
, which has a value defined as "{{ metrics_provider }}-{{ role_name }}"
contains full path of the role, instead of an expected short role identifier like metrics
.
Perhaps the best way to avoid such name space collisions will be to rename the locally defined variable role_name
to a name which is not used globally.
TASK [/tmp/tmp_puugelq/roles/elasticsearch : Establish Elasticsearch metrics export package names] ***
task path: /tmp/tmp_puugelq/roles/elasticsearch/tasks/main.yml:20
fatal: [/cache/fedora-32.qcow2]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: '__elasticsearch_packages_export_pcp' is undefined\n\nThe error appears to be in '/tmp/tmp_puugelq/roles/elasticsearch/tasks/main.yml': line 20, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Establish Elasticsearch metrics export package names\n ^ here\n"}
Configuration of Elasticsearch agent fails with the following error:
fatal: [10.0.139.221]: FAILED! => {"changed": false, "checksum": "c076ec4531eacb5b8058b3a0a5b82a1218acf987", "msg": "Destination directory /etc/pcp/elasticsearch does not exist"}
The issue occurs when the following playbook runs:
# SPDX-License-Identifier: MIT
---
- name: Install Elastic search
hosts: all
roles:
- role: linux-system-roles.metrics
vars:
metrics_from_elasticsearch: yes
The problem is in roles/performancecopilot_metrics_elasticsearch/tasks/main.yml
file in section named Ensure PCP Elasticsearch agent is configured
, where ansible is trying to install an elasticsearch config file from a template to /etc/pcp/elasticsearch
directory. Unfortunately the destination directory does not exists, because at that time there is no pcp-pmda-elasticsearch package installed yet. The pcp-pmda-elasticsearch package owns the /etc/pcp/elasticsearch
directory.
roles/performancecopilot_metrics_mssql/tasks/main.yml
* and pcp-pmda-mssql
roles/performancecopilot_metrics_bpftrace/tasks/main.yml
* and pcp-pmda-bpftrace
I am using the following playbook:
# SPDX-License-Identifier: MIT
---
- name: Ensure that authentication is configured
hosts: all
roles:
- role: linux-system-roles.metrics
vars:
metrics_from_bpftrace: yes
metrics_username: pcptest
metrics_password: tdlendle
tasks:
- name: Check if authentication functionality works
shell: sasldblistusers2 -f /etc/pcp/passwd.db | grep -wq pcptest
- name: Check if a client can access metrics
command: pminfo -f -h "pcp://127.0.0.1?username=pcptest&password=tdlendle" disk.dev.read
Issue # 1
The expectation is, the role will set SASL password for the pcptest
user. Unfortunately this does not happen and the /etc/pcp/passwd.db
file is not created. When digging a bit deeper into the role, the problem is IMO in the file roles/performancecopilot_metrics_pcp/tasks/pmcd.yml
, namely in its section Ensure performance metric collector SASL accounts are configured.
.
In this section the name of a SASL user is expected to be stored in a field saslname
, however there is no such field defined. The role uses field sasluser
instead, which is set to the expected value.
When I change the field saslname
to sasluser
in the roles/performancecopilot_metrics_pcp/tasks/pmcd.yml
file, then the role generates the expected /etc/pcp/passwd.db
file.
Issue # 2
Even if I apply the change, I have just described above (issue # 1), the created /etc/pcp/passwd.db
file is empty (contains no users). That is because the password for the user in roles/performancecopilot_metrics_pcp/tasks/pmcd.yml
file is set using saslpasswd2
command. But the saslpasswd2
command uses -n
switch which prevents the command from storing credentials. Removing the -n
switch from saslpasswd2
command fixes the issue and /etc/pcp/passwd.db
file now contains the password for the user and command sasldblistusers2 -f /etc/pcp/passwd.db | grep -wq pcptest
on the host machine succeeds.
Issue # 3
After I apply fixes described above (issue # 1 and # 2) there is still one problem, why command pminfo -f -h "pcp://127.0.0.1?username=pcptest&password=tdlendle" disk.dev.read
fails.
On the host system, there is no cyrus-sasl-scram
package installed. When I install this package manually, then everything start to work as expected.
The cyrus-sasl-scram
package is defined in the role, in file roles/performancecopilot_metrics_pcp/vars/RedHat.yml
as a variable __pcp_packages_sasl
. However this variable is not used anywhere else, as far I can see.
I need to update the fields names you mentioned in issue #3, and add additional metadata.
In Elasticsearch we send json documents.
This is an example of how it should look like when we send to Elasticsearch.
{
"time": "2017-01-19T:29:10+00:00",
"collectd.processes.ps_code": 21635072,
"dstypes": "gauge",
"interval": 10.0,
"host": "dhcp-0-135.tlv.redhat.com",
"plugin": "processes",
"plugin_instance": "collectd",
"type": "ps_code",
"type_instance": "",
"ovirt": {
"entity": "host",
"host_id": "{{ ovirt_vds_vds_id }}",
"engine_fqdn": "{{ ovirt_engine_fqdn }}",
"cluster_name": "{{ ovirt_vds_cluster_name }}"
},
"tag": "project.ovirt-metrics-{{ ovirt_env_name }}",
"hostname": "hostname",
"ipaddr4": "ip address"
}
Under "ovirt" are additional metadata.
@lberk
When running the Metrics role against a CentOS Stream 9 system, the "Install Redis packages" task fails with error:
fatal: [c9s-server1.example.com]: FAILED! => {"msg": "The task includes an option with an undefined variable. The error was: '__redis_packages_extra' is undefined\n\nThe error appears to be in '/usr/share/linux-system-roles/metrics/roles/redis/tasks/main.yml': line 15, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: Install Redis packages\n ^ here\n"}
This is probably related to there being no metrics/roles/redis/vars/CentOS_9.yml file to define this variable.
Here are the ansible_distribution variables gathered from my CentOS 9 Stream system:
"ansible_distribution": "CentOS",
"ansible_distribution_file_parsed": true,
"ansible_distribution_file_path": "/etc/redhat-release",
"ansible_distribution_file_variety": "RedHat",
"ansible_distribution_major_version": "9",
"ansible_distribution_release": "NA",
"ansible_distribution_version": "9",
When you use metrics
role with metrics_graph_service
set to true, and then login to grafana instance for the 1st time, you need to change username and password. But then when you run a playbook with the metrics role again, it fails on Ensure graphing service runtime settings are configured
task since now ansible can't login to grafana.
TASK [/usr/share/linux-system-roles/metrics/roles/grafana : Ensure graphing service runtime settings are configured] ***
fatal: [centos]: FAILED! => {"cache_control": "no-cache", "changed": false, "connection": "close", "content": "{\"message\":\"invalid username or p
assword\"}", "content_length": "42", "content_type": "application/json; charset=UTF-8", "date": "Sat, 11 Dec 2021 23:26:16 GMT", "elapsed": 0, "exp
ires": "-1", "json": {"message": "invalid username or password"}, "msg": "Status code was 401 and not [200]: HTTP Error 401: Unauthorized", "pragma
": "no-cache", "redirected": false, "status": 401, "url": "http://admin:admin@localhost:3000/api/plugins/performancecopilot-pcp-app/settings", "x_c
ontent_type_options": "nosniff", "x_frame_options": "deny", "x_xss_protection": "1; mode=block"}
Would it be possible to specify grafana username/password to prevent this failure? Ideally I would like to set the password directly in the playbook to avoid setting it for the first time, but I'm not sure if grafana provides a simple way to do that.
Field elasticsearch_agent
in roles/performancecopilot_metrics_elasticsearch/tasks/main.yml
file does not have defined a default value. This is causing fail of the role when metrics_from_elasticsearch: yes
is set in a playbook.
Here is an example playbook:
# SPDX-License-Identifier: MIT
---
- name: Ensure that the role runs
hosts: all
roles:
- role: linux-system-roles.metrics
vars:
metrics_from_elasticsearch: yes
Here is the error message as reported by Ansible:
RUNNING HANDLER [performancecopilot_metrics_pcp : restart pmlogger] ***********************
fatal: [10.0.139.221]: FAILED! => {"msg": "The conditional check 'elasticsearch_agent | bool' failed. The error was: error while evaluating conditional (elasticsearch_agent | bool): 'elasticsearch_agent' is undefined\n\nThe error appears to be in '/usr/share/ansible/roles/metrics/roles/performancecopilot_metrics_pcp/handlers/main.yml': line 19, column 3, but may\nbe elsewhere in the file depending on the exact syntax problem.\n\nThe offending line appears to be:\n\n\n- name: restart pmlogger\n ^ here\n"}
We need to be able to send metrics to Viaq we need support for cert auth authentication, Elasticsearch index parameters, buffer handling and back-off mechanism.
In the Rsyslog role the request also sends the following parameters that
type="omelasticsearch"
name="{{ res.name | default('viaq-elasticsearch') }}"
server="{{ res.server_host | default('logging-es') }}"
serverport="{{ res.server_port | default(9200) | int }}"
template="viaq_template"
searchIndex="index_template"
dynSearchIndex="on"
searchType="com.redhat.viaq.common"
bulkmode="on"
writeoperation="create"
bulkid="id_template"
dynbulkid="on"
retryfailures="on"
retryruleset="try_es"
usehttps="on"
In Fluentd we set the following parameters:
@type elasticsearch
host {{ fluentd_elasticsearch_host }}
port {{ fluentd_elasticsearch_port }}
scheme https
client_cert {{ fluentd_elasticsearch_client_cert_path }}
client_key {{ fluentd_elasticsearch_client_key_path }}
ca_file {{ fluentd_elasticsearch_ca_cert_path }}
ssl_verify {{ fluentd_elasticsearch_ssl_verify|lower }}
target_index_key {{ fluentd_elasticsearch_target_index_key }}
remove_keys {{ fluentd_elasticsearch_remove_keys }}
type_name {{ fluentd_elasticsearch_type_name_metrics }}
request_timeout {{ fluentd_elasticsearch_request_timeout_metrics }}
Buffer configurations:
flush_interval {{ fluentd_flush_interval_metrics }}
buffer_chunk_limit {{ fluentd_buffer_chunk_limit_metrics }}
buffer_queue_limit {{ fluentd_buffer_queue_limit_metrics }}
buffer_queue_full_action {{ fluentd_buffer_queue_full_action_metrics }}
retry_wait {{ fluentd_retry_wait_metrics }}
retry_limit {{ fluentd_retry_limit_metrics }}
disable_retry_limit {{ fluentd_disable_retry_limit_metrics }}
max_retry_wait {{ fluentd_max_retry_wait_metrics }}
flush_at_shutdown {{ fluentd_flush_at_shutdown_metrics }}
num_threads {{ fluentd_num_threads_metrics }}
slow_flush_log_threshold {{ fluentd_slow_flush_log_threshold_metrics }}
Can you please update status in PCP?
What is missing, will it be possible to implement?
This is a blocker for oVirt.
Galaxy playbook is failing to install Grafana. Error is:
fatal: [localhost]: FAILED! => {"changed": false, "msg": "No package matching 'grafana' found available, installed or updated", "rc": 126, "results": ["No package matching 'grafana' found available, installed or updated"]}
Grafana is available to be installed via YUM:
[justin@netmon2 pcp-install]$ sudo yum search grafana
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: mirror.datto.com
* epel: mirror.grid.uchicago.edu
* extras: mirrors.xtom.com
* updates: centos.vwtonline.net
================================================================================= N/S matched: grafana =================================================================================
pcp-webapp-grafana.noarch : Grafana web application for Performance Co-Pilot (PCP)
OS running the playbook is CentOS7.6
Does PCP allow collecting metrics from multiple inputs and defining a
different output to each one?
Yes. PCP operates on a pull model. Each 'client' specifies which
metrics it wants from which pmcd (the pcp 'daemon') and they're
responded to accordingly.
For example , ovirt metrics(collected from collectd) to viaq(openshift logging elasticsearch)
and pcp-zeroconf metrics to local.
Yes. That would all work. They can all be different client tools as
well, ie:
ovirt metrics (collectd/write_prometheus) collected by pcp's
pmdaprometheus then written to elasticsearch via pcp2elasticsearch
(either on the same host, or remotely, or both remote)
and then pcp-zerconf metrics to local. This would be done by pmlogger
logging the metrics to very storage efficient archives.
When a playbook has set metrics_from_bpftrace: yes
the role generates wrong setup for allowed_users
field in /etc/pcp/bpftrace/bpftrace.conf
config file.
The allowed_users
field in the generated /etc/pcp/bpftrace/bpftrace.conf
config file looks as follows:
allowed_users = root,/usr/share/ansible/roles/metrics/roles/performancecopilot_metrics_bpftrace
The expectation is to have only root
user in the field. The path to the bpftrace sub-role is erroneous.
IMO the issue is in tasks/main.yml
file, where in section "Setup bpftrace metrics" is this following line:
- { user: "{{ role_name }}", sasluser: "{{ metrics_username }}", saslpassword: "{{ metrics_password }}" }
The user field expands to path of the sub-role.
We need the PCP role to be able to collect metrics from a Prometheus endpoint or a Collectd with write_prometheus output plugin.
Is restart of PCP required after installing of additional packages?
@lberk
Does pcp-zeroconf deploy a default configuration file that enables the metrics to collect from the machine?
In case we don't want to collect metrics based on the pcp-zeroconf configuration, what are the base pcp packages required for pcp ?
Let's have a following playbook:
# SPDX-License-Identifier: MIT
---
- name: Ensure that the role runs
hosts: all
roles:
- role: linux-system-roles.metrics
When this playbook runs on RHEL-6, it fails with the following error message:
fatal: [10.0.139.92]: FAILED! => {"changed": false, "msg": "No package matching 'cyrus-sasl-scram' found available, installed or updated", "rc": 126, "results": ["cyrus-sasl-lib-2.1.23-15.el6_6.2.x86_64 providing cyrus-sasl-lib is already installed", "No package matching 'cyrus-sasl-scram' found available, installed or updated"]}
IMO the reason is in file roles/performancecopilot_metrics_pcp/vars/RedHat.yml
where this package is requested to be installed. Unfortunately the 'cyrus-sasl-scram' package has been introduced in RHEL/CentOS >= 7 and it is not available on RHEL-6 nor on CentOS-6.
While on CentOS the 'cyrus-sasl-scram' package is requested only on CentOS-7 and CentOS-8 (in roles/performancecopilot_metrics_pcp/vars/CentOS_8.yml
and roles/performancecopilot_metrics_pcp/vars/CentOS_7.yml
files), on RHEL the Ansible tries to install it also on RHEL-6.
In the metrics role there are the following tasks pointing to the role it self:
However it points to it self as pcp
role, while the role it self is called performancecopilot_metrics_pcp
.
Any playbook enabling bptftrace or elasticsearch or mssql fails on this issue. Renaming the role from pcp
to performancecopilot_metrics_pcp
in those three tasks above fix the issue.
Background
bpftrace agent is delivered to a system via pcp-pmda-bpftrace
package. Starting from PCP version 5.2 the pcp-pmda-bpftrace
package has changed the layout of files on a filesystem. While in PCP <= 5.1 the PMDA files are located in /var/lib/pcp/pmdas/bpftrace
directory, in PCP >= 5.2 the files are located in /usr/libexec/pcp/pmdas/bpftrace
and bpftrace.conf
is located in /etc/pcp/bpftrace/
directory. For PCP >= 5.2 there are symlinks from the old /var/lib/pcp/pmdas/bpftrace
directory to new locations, to achieve backward compatibility.
The issue
When metrics_from_bpftrace: yes
is set in a playbook, the role generates the bpftrace.conf
file in /etc/pcp/bpftrace/
directory (so it supports PCP >= 5.2). However when the role runs on a platform with PCP <= 5.1, than the config file is expected by the PMDA to be in /var/lib/pcp/pmdas/bpftrace
directory. As such, all the platforms with PCP <= 5.1 do not have bpftrace agent configured properly (a default config file is used instead).
List of affected platforms:
The same issue affects also configuration files of mssql
and elasticsearch
PMDAs.
Do not use ignore_errors: yes
as in https://github.com/linux-system-roles/metrics/blob/master/roles/elasticsearch/tasks/main.yml#L62
If it is necessary to ignore errors from the task, the task should register
the output, then have a separate task to fail
in case there are "real" errors, so as not to mask any unexpected errors.
When I try to enable import of metrics from Elastic Search, I need to manually restart pmcd
on the host machine after a playbook run. Without the restart elasticsearch agent is not registered.
The role installs pcp-pmda-elasticsearch
package and creates /var/lib/pcp/pmdas/elasticsearch/.NeedInstall
file. However the pmcd
is not restarted then, so pmcd startup script does not register it.
I am using a playbook like this one:
# SPDX-License-Identifier: MIT
---
- name: Make ElasticSearch metrics available
hosts: all
roles:
- role: linux-system-roles.metrics
vars:
metrics_from_elasticsearch: yes
Note: When I run the playbook above I do not see any execution of restart pmcd
handler in the log. However, if the playbook explicitly forces execution of handlers ...
# SPDX-License-Identifier: MIT
---
- name: Make ElasticSearch metrics available
hosts: all
roles:
- role: linux-system-roles.metrics
vars:
metrics_from_elasticsearch: yes
tasks:
- name: Flush handlers
meta: flush_handlers
... then I see the following message in the log:
RUNNING HANDLER [performancecopilot_metrics_pcp : restart pmcd] ***************************
skipping: [10.0.137.131]
So it looks like the role sends a notification to restart pmcd
however the handler is skipped for some reason.
When metrics_graph_service: yes
is set in a playbook, the role installs grafana-pcp
package. On the latest releases of RHEL and Fedora the grafana-pcp
package (version 3.x.y) delivers a Grafana dashboard PCP Vector: eBPF/BCC Overview. However this dashboard requires pcp-pmda-bcc
package to be installed and configured.
The metrics role currently does not install the BCC PMDA, so all the charts of the mentioned dashboard are in error state, not being able to get metrics from the BCC agent.
I have a playbook like this:
# SPDX-License-Identifier: MIT
---
- name: Ensure that MSSQL is configured
hosts: all
roles:
- role: linux-system-roles.metrics
vars:
metrics_from_mssql: yes
pre_tasks:
- name: Ensure python3-pyodbc is installed
package:
name: python3-pyodbc
state: present
tasks:
- name: Check if MSSQL functionality works
shell: pmprobe -I pmcd.agent.name | grep -w '"mssql"'
On Fedora and CentOS this works just fine. However on RHEL it fails with the following error message:
TASK [Check if mssql pmda is registered] ************************************************** fatal: [10.0.139.219]: FAILED! => {"changed": true, "cmd": "pmprobe -I pmcd.agent.name | grep -w '\"mssql\"'", "delta": "0:00:00.012001", "end": "2021-01-29 04:59:55.181201", "msg": "non-zero return code", "rc": 1, "start": "2021-01-29 04:59:55.169200", "stderr": "", "stderr_lines": [], "stdout": "", "stdout_lines": []}
... so the registration of the PMDA has not finished. There is /var/lib/pcp/pmdas/mssql/.NeedInstall
file created, but pmcd
is not restarted.
The reason is IMO somehow related to pcp-zeroconf
package, which starts and enables pmcd
service only on RHEL platform, while on Fedora and CentOS the pmcd
is not started nor enabled after boot.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.