trevorndodds / elasticsearch-metrics Goto Github PK

View Code? Open in Web Editor NEW

102.0 102.0 99.0 30 KB

Python 94.91% Shell 2.33% Dockerfile 2.76%

elasticsearch-metrics's People

Contributors

Stargazers

Watchers

Forkers

freko247 rogerdk trinitronx recluse badgerops songjiao rjuson liuzhen1984 jecnua duylong foorb qingsongyao solva1978 uncinimichel ebates-edc risdenk adventurer2008 huang-lin accpet navyaijm2017 jijiajie jlauio sonecabr brunolerner sbourlon pwhetsel vpistis huohuo karpanin blingdom raghu999 slb350 aserunix dysnix nikonorovi wipacrepo voiprodrigo keyboardfann cusystem tommulvey rayyildiz taganaka calmzeala bodetunji2200 lupescudaniel snojo zqc0512 zqctest tuser4198 mohamedaittaleb onii4an anngoc21 sungaomeng sagararora2492 ynuosoft ssongjunxian biswapanda praveenmak nphyez ryanbeta okhiwell charili raghavtan jingweiz300 avanzelli johnxu-cn hhjie teudimundo jamshid yijxiang arope28 liu-xiao-guo drewboswell feirenk zouqingyun joaquimfreitas erichuang86 natarajanmurukan jaydlowrider alfonso-rb devops-corner junjun2712 dongliwu hzchenkj barryhatfield mmangione justhonor atishya-jain bianhezhen brunomarram chimumu mahajanankur natewerner

elasticsearch-metrics's Issues

is not support elasticsearch 7.x+ with ssl ?

with es7.1.1 the basic auth is free
i use ssl with elasticsearch
but this can't run with ssl
thanks.
is not support elasticsearch 7.1.1 with ssl ?
@trevorndodds

Use a code for the status

Hi,

It will be nice to have a return code for the status in order to set a color on the dashboard.

Best regards,

elasticsearch2elastic.py will stop while query es too long

When elasticsearch2elastic.py run some time , it will crash during to timediff < 0. I think it's because when it query busy ES cluster and interval will bigger than timediff.

[root@xxx init.d]# systemctl status eshealthcollector-prod 
● eshealthcollector-prod.service - Elasticsearch Health Collector - xxx Production Cluster
   Loaded: loaded (/usr/lib/systemd/system/eshealthcollector-prod.service; enabled; vendor preset: disabled)
   Active: active (exited) since Mon 2017-04-24 15:53:13 CST; 12min ago
  Process: 29808 ExecStop=/bin/sh /etc/init.d/eshealthcollector-prod stop (code=exited, status=0/SUCCESS)
  Process: 29818 ExecStart=/bin/sh /etc/init.d/eshealthcollector-prod start (code=exited, status=0/SUCCESS)
 Main PID: 29818 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/eshealthcollector-prod.service

Apr 24 16:02:54 xxx sh[29818]: time:1493020954.75
Apr 24 16:02:54 xxx sh[29818]: timediff8.77753591537
Apr 24 16:02:54 xxx sh[29818]: Total Elapsed Time: 10.9948761463
Apr 24 16:02:54 xxx sh[29818]: nextRun:1493020973.54
Apr 24 16:02:54 xxx sh[29818]: time:1493020974.53
Apr 24 16:02:54 xxx sh[29818]: timediff:-0.994902133942
Apr 24 16:02:54 xxx sh[29818]: Traceback (most recent call last):
Apr 24 16:02:54 xxx sh[29818]: File "/admin/scripts/eshealthcollector-prod/elasticsearch2elastic.py", line 112, in <module>
Apr 24 16:02:54 xxx sh[29818]: time.sleep(timeDiff)
Apr 24 16:02:54 xxx sh[29818]: IOError: [Errno 22] Invalid argument

Dashboard is down because $nodename resolves to a text value.

$nodename resolves to the string "name.keyword". It's breaking the dashboard. How can I resolve this?

Can't get elasticsearch cluster stats

Hi,

I'm using following settings, when i run the script it shows HTTP error mentioned below. And I have added elasticsearch datasource in grafana too.

# ElasticSearch Cluster to Monitor
elasticServer = os.environ.get('ES_METRICS_CLUSTER_URL', 'http://elasticserverIP:9200')
interval = int(os.environ.get('ES_METRICS_INTERVAL', '60'))

# ElasticSearch Cluster to Send Metrics
elasticIndex = os.environ.get('ES_METRICS_INDEX_NAME', '*')
elasticMonitoringCluster = os.environ.get('ES_METRICS_MONITORING_CLUSTER_URL', 'http://elasticserverIP:9200')

Error:

Error:  HTTP Error 400: Bad Request
Error:  HTTP Error 400: Bad Request
Error:  HTTP Error 400: Bad Request
Error:  HTTP Error 400: Bad Request
Error:  HTTP Error 400: Bad Request
Error:  HTTP Error 400: Bad Request
Error:  HTTP Error 400: Bad Request
Total Elapsed Time: 1.16779112816

How could I get rid off this issue ?

Regards,
redhawk19

the connect to es URL can support mulit urls?

the connect to es URL can support mulit urls? in one cluster? when one es node down the other can use..
elasticServer = os.environ.get('ES_METRICS_CLUSTER_URL', 'http://xxxx:9200')
can support as :
elasticServer = os.environ.get('ES_METRICS_CLUSTER_URL', 'http://xxx:9200,http://xxx1:9200')

thanks.
@trevorndodds

Can't connect to elasticsearch cluster with basic auth

I recently added auth to my ELK stack using readonlyrest. Elasticsearch-metrics was working fine before the auth but won't connect anymore. Snooping through elasticsearch2elastic.py, I see that there is no code to handle a username/password combination and forward it as part of the request.

Node OS, Node Stat Indices, Node JVM Displaying Error

All graphs on the above mentioned are not displaying data and throwing the error below.
Cannot parse name:()
"root_cause": [
{
"type": "parse_exception",
"reason": "parse_exception: Encountered " ")" ") "" at line 1, column 6.\nWas expecting one of:\n ...\n "+" ...\n "-" ...\n ...\n "(" ...\n "" ...\n ...\n ...\n ...\n ...\n ...\n "[" ...\n "{" ...\n ...\n ...\n "
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "elasticsearch_metrics-2017.01.27",
"node": "Y9VfAr1aS0-QeiLEG9UmIg",
"reason": {
"type": "query_shard_exception",
"reason": "Failed to parse query [name:()]",
"index_uuid": "XoXqMb2GS0-_mcPsgESjZA",
"index": "elasticsearch_metrics-2017.01.27",
"caused_by": {
"type": "parse_exception",
"reason": "parse_exception: Cannot parse 'name:()': Encountered " ")" ") "" at line 1, column 6.\nWas expecting one of:\n ...\n "+" ...\n "-" ...\n ...\n "(" ...\n "" ...\n ...\n ...\n ...\n ...\n ...\n "[" ...\n "{" ...\n ...\n ...\n ",
"caused_by": {
"type": "parse_exception",
"reason": "parse_exception: Encountered " ")" ") "" at line 1, column 6.\nWas expecting one of:\n ...\n "+" ...\n "-" ...\n ...\n "(" ...\n "" ...\n ...\n ...\n ...\n ...\n ...\n "[" ...\n "{" ...\n ...\n ...\n "
}
}
}
}
],
"caused_by": {
"type": "query_shard_exception",
"reason": "Failed to parse query [name:()]",
"index_uuid": "XoXqMb2GS0-_mcPsgESjZA",
"index": "elasticsearch_metrics-2017.01.27",
"caused_by": {
"type": "parse_exception",
"reason": "parse_exception: Cannot parse 'name:()': Encountered " ")" ") "" at line 1, column 6.\nWas expecting one of:\n ...\n "+" ...\n "-" ...\n ...\n "(" ...\n "" ...\n ...\n ...\n ...\n ...\n ...\n "[" ...\n "{" ...\n ...\n ...\n ",
"caused_by": {
"type": "parse_exception",
"reason": "parse_exception: Encountered " ")" ") "" at line 1, column 6.\nWas expecting one of:\n ...\n "+" ...\n "-" ...\n ...\n "(" ...\n "*" ...\n ...\n ...\n ...\n ...\n ...\n "[" ...\n "{" ...\n ...\n ...\n "
}
}
}
}

Get elasticsearch stats without post request

Is it possible to get elasticsearch stats without posting index with timestamp ?

Difference between ES_METRICS_CLUSTER_URL & ES_METRICS_MONITORING_CLUSTER_URL ?

Hi All,

Can you explain what's the difference btw ES_METRICS_CLUSTER_URL & ES_METRICS_MONITORING_CLUSTER_URL as I am little confused what should be the value for them?
As far my understanding ES_METRICS_CLUSTER_URL="http://elestic_serch_node1:9200, http://elestic_serch_node1:9200, http://elestic_serch_node1:9200" (all elesticsearch cluster nodes). Is this correct ?
Also, want to know the value for ES_METRICS_MONITORING_CLUSTER_URL="????" is it collector node URL or something else?

Thanks in advance.

Created a Service File for Linux

So I created a Service File so that these metrics could be started automatically on restart. It should work on any disto that uses systemd, but it's tested in Ubuntu 18.04. You'll also need to modify the path to the python file:

[Unit]
Description=ElasticSearch Graphina Metrics Update Service
After=network.target
After=elasticsearch.service

[Service]
Type=simple
User=root
ExecStart=/opt/elasticsearch_metrics/elasticsearch2elastic.py

[Install]
WantedBy=multi-user.target

Alerts on cluster status

I cannot add an alert on the main cluster status panel (green, yellow, red).

ENHANCEMENT: Grafana Dashboard adjustment: Cluster choice also updates nodes

By slightly adjusting the search of the $Node variable, the dashboard will be updated to only include the nodes that belong to the $Cluster that was chosen.

Code:

{"find": "terms", "field": "name.keyword","query": "cluster_name: $Cluster"}

grafana dashboard Rev8 fail

Hi @trevorndodds,
I try the dashboard rev8 and fail. I also try Rev5 and works fine, could you help to check Rev8?

https://grafana.com/dashboards/878/revisions

Rev8

Rev5

ESv7.0 compatibility

i have tried same dashboard for datasource ElasticsSearch v7.6.0. but no data is getting displayed or fetched. Any suggestions for this? Awaiting for quick response.

How can i fetch index name and store it to elasticsearch

Hi all,

I would like to list index name in specific fiels like index_names

the purpose is to graphe into grafana, the store size used by index.

Best regards

Pull down for selecting Nodes works only for the first 10 items

I have an ES cluster with "n" nodes. The script is loading stats correctly in to the elasticsearch-metrics index for all the "n" Nodes. When viewing the dashboard, Cluster level dashboards are populated fine for any selection. If you select All or individually a Node only the first 10 Nodes in the list are displaying data in the Node level panels. The dropdown shows all the "n" nodes correctly. Did try adding "size" : n to the template but no effect.

Usage in grafana

I have already used the dashboard and thanks for great idea. At first, I was confused about how to intergrate with grafana. Finally I made it. So I'd like to add a supplement.
steps:
1. my envs: python 2.7, grafana 4.4.3, elasticsearch 5.x.
2. clone the repo to your host, and run cd elasticsearch-metrics/Grafana && python elasticsearch2elastic.py.
3. I suggest you read the elasticsearch2elastic.py.
4. Open your grafana, choose Dashboard => import, and paste the dashboard id in the inputbox, here is its id.
1. grafana will load the dashboard automatically, then you can see as follow
1. then you need to fill the data source name which was created by elasticsearch2elastic.py for elasticsearch_prod_metrics choice. So before this step you need to go DataSource and add a new one whose index name is elasticsearch_metrics*
2. choose the datasource, click import and enjoy it.

No data displayed, fails to load cluster or node name

Hello,

all the dashboard is read, so no data is loaded. there are many errors, this is one

    "root_cause": [
        {
            "type": "parse_exception",
            "reason": "parse_exception: Encountered \"\" at line 1, column 13.\nWas expecting one of:\n     ...\n    \"(\" ...\n    \"*\" ...\n     ...\n     ...\n     ...\n     ...\n     ...\n    \"[\" ...\n    \"{\" ...\n     ...\n    "
        },
        {
            "type": "parse_exception",
            "reason": "parse_exception: Encountered \"\" at line 1, column 13.\nWas expecting one of:\n     ...\n    \"(\" ...\n    \"*\" ...\n     ...\n     ...\n     ...\n     ...\n     ...\n    \"[\" ...\n    \"{\" ...\n     ...\n    "
        }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
        {
            "shard": 0,
            "index": "graylog_0",
            "node": "qX82JCzNQFyLACYz8s60zA",
            "reason": {
                "type": "query_parsing_exception",
                "reason": "Failed to parse query [cluster_name:]",
                "index": "graylog_0",
                "line": 1,
                "col": 195,
                "caused_by": {
                    "type": "parse_exception",
                    "reason": "parse_exception: Cannot parse 'cluster_name:': Encountered \"\" at line 1, column 13.\nWas expecting one of:\n     ...\n    \"(\" ...\n    \"*\" ...\n     ...\n     ...\n     ...\n     ...\n     ...\n    \"[\" ...\n    \"{\" ...\n     ...\n    ",
                    "caused_by": {
                        "type": "parse_exception",
                        "reason": "parse_exception: Encountered \"\" at line 1, column 13.\nWas expecting one of:\n     ...\n    \"(\" ...\n    \"*\" ...\n     ...\n     ...\n     ...\n     ...\n     ...\n    \"[\" ...\n    \"{\" ...\n     ...\n    "
                    }
                }
            }
        },
        {
            "shard": 0,
            "index": "graylog_1",
            "node": "qX82JCzNQFyLACYz8s60zA",
            "reason": {
                "type": "query_parsing_exception",
                "reason": "Failed to parse query [cluster_name:]",
                "index": "graylog_1",
                "line": 1,
                "col": 195,
                "caused_by": {
                    "type": "parse_exception",
                    "reason": "parse_exception: Cannot parse 'cluster_name:': Encountered \"\" at line 1, column 13.\nWas expecting one of:\n     ...\n    \"(\" ...\n    \"*\" ...\n     ...\n     ...\n     ...\n     ...\n     ...\n    \"[\" ...\n    \"{\" ...\n     ...\n    ",
                    "caused_by": {
                        "type": "parse_exception",
                        "reason": "parse_exception: Encountered \"\" at line 1, column 13.\nWas expecting one of:\n     ...\n    \"(\" ...\n    \"*\" ...\n     ...\n     ...\n     ...\n     ...\n     ...\n    \"[\" ...\n    \"{\" ...\n     ...\n    "
                    }
                }
            }
        }
    ]
}

this is the templating

what could it be?

Cluster and node names with "-"

This is not the right place to ask .... I will try.
From the Grafana labs page:

If your cluster_name or node names have "-" you will have to load a custom index to set the "name" field to not_analyzed

What do you exactly mean for loading a custom index? The issue I see is that the filtering by cluster name does not actually filter and all the data set is returned for each cluster selection.
If I convert the queries of the panels as cluster_name.keyword:$Cluster the filtering works.

why we need index+date as index name

Hi I saw code

url = "%(cluster)s/%(index)s-%(index_period)s/message" % url_parameters

wondering why we create different index for everyday metrics?
the grafana data source need to specify index name.
then the dashboard only show the metric of specific date?

elasticsearch2elastic.py crach because can't get the clustername

The elasticsearch2elastic.py will exit during heavy ES cluster or network problem. When the program stop , we should manually to restart it. Could we have a retry method?

May 01 00:23:51 xxx sh[46351]: Total Elapsed Time: 0.115025043488
May 01 00:23:51 xxx sh[46351]: Total Elapsed Time: 0.173295974731
May 01 00:23:51 xxx sh[46351]: Traceback (most recent call last):
May 01 00:23:51 xxx sh[46351]: File "/admin/scripts/eshealthcollector-stage/elasticsearch2elastic-stage.py", line 118, in <module>
May 01 00:23:51 xxx sh[46351]: main()
May 01 00:23:51 xxx sh[46351]: File "/admin/scripts/eshealthcollector-stage/elasticsearch2elastic-stage.py", line 99, in main
May 01 00:23:51 xxx sh[46351]: fetch_nodestats(clusterName)
May 01 00:23:51 xxx sh[46351]: File "/admin/scripts/eshealthcollector-stage/elasticsearch2elastic-stage.py", line 55, in fetch_nodestats
May 01 00:23:51 xxx sh[46351]: response = urllib.urlopen(urlData)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/urllib.py", line 87, in urlopen
May 01 00:23:51 xxx sh[46351]: return opener.open(url)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/urllib.py", line 208, in open
May 01 00:23:51 xxx sh[46351]: return getattr(self, name)(url)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/urllib.py", line 345, in open_http
May 01 00:23:51 xxx sh[46351]: h.endheaders(data)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/httplib.py", line 975, in endheaders
May 01 00:23:51 xxx sh[46351]: self._send_output(message_body)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/httplib.py", line 835, in _send_output
May 01 00:23:51 xxx sh[46351]: self.send(msg)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/httplib.py", line 797, in send
May 01 00:23:51 xxx sh[46351]: self.connect()
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/httplib.py", line 778, in connect
May 01 00:23:51 xxx sh[46351]: self.timeout, self.source_address)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
May 01 00:23:51 xxx sh[46351]: for res in getaddrinfo(host, port, 0, SOCK_STREAM):
May 01 00:23:51 xxx sh[46351]: IOError: [Errno socket error] [Errno -2] Name or service not known

Search Rate / Latency

Hi,

The keys "primaries.search.query_total" and "primaries.search.query_time_in_millis" do not have the same unit. It should not be on the same graph.

Can't get any Data

Hi,

I found your dashboard and tried to make it work, but I don't really understand what you mean there:

Can you please help me?

How to force refresh when switching clusters on dashboard?

First, thanks for a great dashboard!

I augmented the Python data gathering script and then rewrote it in Go (for speed and to follow internal preference for our Elasticsearch tools). We wanted to use one instance of the tool to collect data on two or more clusters.

I'm now feeding stats from 3 clusters into the Grafana ES instance. When I change the cluster by using the dropdown menu, I cannot seem to make the panels on the dashboard refresh.

No reconnect after a failed node

Hi,

I have a issue when a failed node :

Error:  <urlopen error [Errno 111] Connection refused>

field expansion matches too many fields

After added the 5th node to the cluster I get the following error in elk log file
when I try to update the dashboard.

It seems the index should be created with a valid index.query.default_field

Any suggestions ?

thanks in advance

Ale

Why?

Why did you write an exporter yourself? Elasticsearch (as well as Logstash btw) do expose all of their metrics out-of-the-box by simply adding the following to your elasticsearch.yml (or logstash.yml):

xpack.monitoring.enabled: true
xpack.monitoring.collection.enabled: true

See here for elasticsearch and here for logstash.

Afterwards you get an monitoring-es (and monitoring-logstash) index in your cluster with all of the metrics. I think this should be the way to go, because you do not need to adept your own exporter in case a new version of es or logstash comes out.