Giter VIP home page Giter VIP logo

elasticsearch-metrics's People

Contributors

calmzeala avatar foorb avatar keyboardfann avatar qingsongyao avatar rogerdk avatar trevorndodds avatar trinitronx avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar

elasticsearch-metrics's Issues

Use a code for the status

Hi,

It will be nice to have a return code for the status in order to set a color on the dashboard.

Best regards,

elasticsearch2elastic.py will stop while query es too long

When elasticsearch2elastic.py run some time , it will crash during to timediff < 0. I think it's because when it query busy ES cluster and interval will bigger than timediff.

[root@xxx init.d]# systemctl status eshealthcollector-prod 
โ— eshealthcollector-prod.service - Elasticsearch Health Collector - xxx Production Cluster
   Loaded: loaded (/usr/lib/systemd/system/eshealthcollector-prod.service; enabled; vendor preset: disabled)
   Active: active (exited) since Mon 2017-04-24 15:53:13 CST; 12min ago
  Process: 29808 ExecStop=/bin/sh /etc/init.d/eshealthcollector-prod stop (code=exited, status=0/SUCCESS)
  Process: 29818 ExecStart=/bin/sh /etc/init.d/eshealthcollector-prod start (code=exited, status=0/SUCCESS)
 Main PID: 29818 (code=exited, status=0/SUCCESS)
   CGroup: /system.slice/eshealthcollector-prod.service

Apr 24 16:02:54 xxx sh[29818]: time:1493020954.75
Apr 24 16:02:54 xxx sh[29818]: timediff8.77753591537
Apr 24 16:02:54 xxx sh[29818]: Total Elapsed Time: 10.9948761463
Apr 24 16:02:54 xxx sh[29818]: nextRun:1493020973.54
Apr 24 16:02:54 xxx sh[29818]: time:1493020974.53
Apr 24 16:02:54 xxx sh[29818]: timediff:-0.994902133942
Apr 24 16:02:54 xxx sh[29818]: Traceback (most recent call last):
Apr 24 16:02:54 xxx sh[29818]: File "/admin/scripts/eshealthcollector-prod/elasticsearch2elastic.py", line 112, in <module>
Apr 24 16:02:54 xxx sh[29818]: time.sleep(timeDiff)
Apr 24 16:02:54 xxx sh[29818]: IOError: [Errno 22] Invalid argument

Can't get elasticsearch cluster stats

Hi,

I'm using following settings, when i run the script it shows HTTP error mentioned below. And I have added elasticsearch datasource in grafana too.

# ElasticSearch Cluster to Monitor
elasticServer = os.environ.get('ES_METRICS_CLUSTER_URL', 'http://elasticserverIP:9200')
interval = int(os.environ.get('ES_METRICS_INTERVAL', '60'))

# ElasticSearch Cluster to Send Metrics
elasticIndex = os.environ.get('ES_METRICS_INDEX_NAME', '*')
elasticMonitoringCluster = os.environ.get('ES_METRICS_MONITORING_CLUSTER_URL', 'http://elasticserverIP:9200')

Error:

Error:  HTTP Error 400: Bad Request
Error:  HTTP Error 400: Bad Request
Error:  HTTP Error 400: Bad Request
Error:  HTTP Error 400: Bad Request
Error:  HTTP Error 400: Bad Request
Error:  HTTP Error 400: Bad Request
Error:  HTTP Error 400: Bad Request
Total Elapsed Time: 1.16779112816

How could I get rid off this issue ?

Regards,
redhawk19

Can't connect to elasticsearch cluster with basic auth

I recently added auth to my ELK stack using readonlyrest. Elasticsearch-metrics was working fine before the auth but won't connect anymore. Snooping through elasticsearch2elastic.py, I see that there is no code to handle a username/password combination and forward it as part of the request.

Node OS, Node Stat Indices, Node JVM Displaying Error

All graphs on the above mentioned are not displaying data and throwing the error below.
Cannot parse name:()
"root_cause": [
{
"type": "parse_exception",
"reason": "parse_exception: Encountered " ")" ") "" at line 1, column 6.\nWas expecting one of:\n ...\n "+" ...\n "-" ...\n ...\n "(" ...\n "" ...\n ...\n ...\n ...\n ...\n ...\n "[" ...\n "{" ...\n ...\n ...\n "
}
],
"type": "search_phase_execution_exception",
"reason": "all shards failed",
"phase": "query",
"grouped": true,
"failed_shards": [
{
"shard": 0,
"index": "elasticsearch_metrics-2017.01.27",
"node": "Y9VfAr1aS0-QeiLEG9UmIg",
"reason": {
"type": "query_shard_exception",
"reason": "Failed to parse query [name:()]",
"index_uuid": "XoXqMb2GS0-_mcPsgESjZA",
"index": "elasticsearch_metrics-2017.01.27",
"caused_by": {
"type": "parse_exception",
"reason": "parse_exception: Cannot parse 'name:()': Encountered " ")" ") "" at line 1, column 6.\nWas expecting one of:\n ...\n "+" ...\n "-" ...\n ...\n "(" ...\n "
" ...\n ...\n ...\n ...\n ...\n ...\n "[" ...\n "{" ...\n ...\n ...\n ",
"caused_by": {
"type": "parse_exception",
"reason": "parse_exception: Encountered " ")" ") "" at line 1, column 6.\nWas expecting one of:\n ...\n "+" ...\n "-" ...\n ...\n "(" ...\n "" ...\n ...\n ...\n ...\n ...\n ...\n "[" ...\n "{" ...\n ...\n ...\n "
}
}
}
}
],
"caused_by": {
"type": "query_shard_exception",
"reason": "Failed to parse query [name:()]",
"index_uuid": "XoXqMb2GS0-_mcPsgESjZA",
"index": "elasticsearch_metrics-2017.01.27",
"caused_by": {
"type": "parse_exception",
"reason": "parse_exception: Cannot parse 'name:()': Encountered " ")" ") "" at line 1, column 6.\nWas expecting one of:\n ...\n "+" ...\n "-" ...\n ...\n "(" ...\n "
" ...\n ...\n ...\n ...\n ...\n ...\n "[" ...\n "{" ...\n ...\n ...\n ",
"caused_by": {
"type": "parse_exception",
"reason": "parse_exception: Encountered " ")" ") "" at line 1, column 6.\nWas expecting one of:\n ...\n "+" ...\n "-" ...\n ...\n "(" ...\n "*" ...\n ...\n ...\n ...\n ...\n ...\n "[" ...\n "{" ...\n ...\n ...\n "
}
}
}
}

Difference between ES_METRICS_CLUSTER_URL & ES_METRICS_MONITORING_CLUSTER_URL ?

Hi All,

Can you explain what's the difference btw ES_METRICS_CLUSTER_URL & ES_METRICS_MONITORING_CLUSTER_URL as I am little confused what should be the value for them?
As far my understanding ES_METRICS_CLUSTER_URL="http://elestic_serch_node1:9200, http://elestic_serch_node1:9200, http://elestic_serch_node1:9200" (all elesticsearch cluster nodes). Is this correct ?
Also, want to know the value for ES_METRICS_MONITORING_CLUSTER_URL="????" is it collector node URL or something else?

Thanks in advance.

Created a Service File for Linux

So I created a Service File so that these metrics could be started automatically on restart. It should work on any disto that uses systemd, but it's tested in Ubuntu 18.04. You'll also need to modify the path to the python file:

[Unit]
Description=ElasticSearch Graphina Metrics Update Service
After=network.target
After=elasticsearch.service

[Service]
Type=simple
User=root
ExecStart=/opt/elasticsearch_metrics/elasticsearch2elastic.py

[Install]
WantedBy=multi-user.target

ESv7.0 compatibility

i have tried same dashboard for datasource ElasticsSearch v7.6.0. but no data is getting displayed or fetched. Any suggestions for this? Awaiting for quick response.

Pull down for selecting Nodes works only for the first 10 items

I have an ES cluster with "n" nodes. The script is loading stats correctly in to the elasticsearch-metrics index for all the "n" Nodes. When viewing the dashboard, Cluster level dashboards are populated fine for any selection. If you select All or individually a Node only the first 10 Nodes in the list are displaying data in the Node level panels. The dropdown shows all the "n" nodes correctly. Did try adding "size" : n to the template but no effect.

Usage in grafana

  1. I have already used the dashboard and thanks for great idea. At first, I was confused about how to intergrate with grafana. Finally I made it. So I'd like to add a supplement.
  2. steps:
    1. my envs: python 2.7, grafana 4.4.3, elasticsearch 5.x.
    2. clone the repo to your host, and run cd elasticsearch-metrics/Grafana && python elasticsearch2elastic.py.
    3. I suggest you read the elasticsearch2elastic.py.
    4. Open your grafana, choose Dashboard => import, and paste the dashboard id in the inputbox, here is its id.

    image
    image

    1. grafana will load the dashboard automatically, then you can see as follow

    image

    1. then you need to fill the data source name which was created by elasticsearch2elastic.py for elasticsearch_prod_metrics choice. So before this step you need to go DataSource and add a new one whose index name is elasticsearch_metrics*
    2. choose the datasource, click import and enjoy it.

No data displayed, fails to load cluster or node name

Hello,

all the dashboard is read, so no data is loaded. there are many errors, this is one

    "root_cause": [
        {
            "type": "parse_exception",
            "reason": "parse_exception: Encountered \"\" at line 1, column 13.\nWas expecting one of:\n     ...\n    \"(\" ...\n    \"*\" ...\n     ...\n     ...\n     ...\n     ...\n     ...\n    \"[\" ...\n    \"{\" ...\n     ...\n    "
        },
        {
            "type": "parse_exception",
            "reason": "parse_exception: Encountered \"\" at line 1, column 13.\nWas expecting one of:\n     ...\n    \"(\" ...\n    \"*\" ...\n     ...\n     ...\n     ...\n     ...\n     ...\n    \"[\" ...\n    \"{\" ...\n     ...\n    "
        }
    ],
    "type": "search_phase_execution_exception",
    "reason": "all shards failed",
    "phase": "query",
    "grouped": true,
    "failed_shards": [
        {
            "shard": 0,
            "index": "graylog_0",
            "node": "qX82JCzNQFyLACYz8s60zA",
            "reason": {
                "type": "query_parsing_exception",
                "reason": "Failed to parse query [cluster_name:]",
                "index": "graylog_0",
                "line": 1,
                "col": 195,
                "caused_by": {
                    "type": "parse_exception",
                    "reason": "parse_exception: Cannot parse 'cluster_name:': Encountered \"\" at line 1, column 13.\nWas expecting one of:\n     ...\n    \"(\" ...\n    \"*\" ...\n     ...\n     ...\n     ...\n     ...\n     ...\n    \"[\" ...\n    \"{\" ...\n     ...\n    ",
                    "caused_by": {
                        "type": "parse_exception",
                        "reason": "parse_exception: Encountered \"\" at line 1, column 13.\nWas expecting one of:\n     ...\n    \"(\" ...\n    \"*\" ...\n     ...\n     ...\n     ...\n     ...\n     ...\n    \"[\" ...\n    \"{\" ...\n     ...\n    "
                    }
                }
            }
        },
        {
            "shard": 0,
            "index": "graylog_1",
            "node": "qX82JCzNQFyLACYz8s60zA",
            "reason": {
                "type": "query_parsing_exception",
                "reason": "Failed to parse query [cluster_name:]",
                "index": "graylog_1",
                "line": 1,
                "col": 195,
                "caused_by": {
                    "type": "parse_exception",
                    "reason": "parse_exception: Cannot parse 'cluster_name:': Encountered \"\" at line 1, column 13.\nWas expecting one of:\n     ...\n    \"(\" ...\n    \"*\" ...\n     ...\n     ...\n     ...\n     ...\n     ...\n    \"[\" ...\n    \"{\" ...\n     ...\n    ",
                    "caused_by": {
                        "type": "parse_exception",
                        "reason": "parse_exception: Encountered \"\" at line 1, column 13.\nWas expecting one of:\n     ...\n    \"(\" ...\n    \"*\" ...\n     ...\n     ...\n     ...\n     ...\n     ...\n    \"[\" ...\n    \"{\" ...\n     ...\n    "
                    }
                }
            }
        }
    ]
}	

this is the templating
image

what could it be?

Cluster and node names with "-"

This is not the right place to ask .... I will try.
From the Grafana labs page:

If your cluster_name or node names have "-" you will have to load a custom index to set the "name" field to not_analyzed

What do you exactly mean for loading a custom index? The issue I see is that the filtering by cluster name does not actually filter and all the data set is returned for each cluster selection.
If I convert the queries of the panels as cluster_name.keyword:$Cluster the filtering works.

why we need index+date as index name

Hi I saw code

url = "%(cluster)s/%(index)s-%(index_period)s/message" % url_parameters

wondering why we create different index for everyday metrics?
the grafana data source need to specify index name.
then the dashboard only show the metric of specific date?

elasticsearch2elastic.py crach because can't get the clustername

The elasticsearch2elastic.py will exit during heavy ES cluster or network problem. When the program stop , we should manually to restart it. Could we have a retry method?

May 01 00:23:51 xxx sh[46351]: Total Elapsed Time: 0.115025043488
May 01 00:23:51 xxx sh[46351]: Total Elapsed Time: 0.173295974731
May 01 00:23:51 xxx sh[46351]: Traceback (most recent call last):
May 01 00:23:51 xxx sh[46351]: File "/admin/scripts/eshealthcollector-stage/elasticsearch2elastic-stage.py", line 118, in <module>
May 01 00:23:51 xxx sh[46351]: main()
May 01 00:23:51 xxx sh[46351]: File "/admin/scripts/eshealthcollector-stage/elasticsearch2elastic-stage.py", line 99, in main
May 01 00:23:51 xxx sh[46351]: fetch_nodestats(clusterName)
May 01 00:23:51 xxx sh[46351]: File "/admin/scripts/eshealthcollector-stage/elasticsearch2elastic-stage.py", line 55, in fetch_nodestats
May 01 00:23:51 xxx sh[46351]: response = urllib.urlopen(urlData)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/urllib.py", line 87, in urlopen
May 01 00:23:51 xxx sh[46351]: return opener.open(url)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/urllib.py", line 208, in open
May 01 00:23:51 xxx sh[46351]: return getattr(self, name)(url)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/urllib.py", line 345, in open_http
May 01 00:23:51 xxx sh[46351]: h.endheaders(data)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/httplib.py", line 975, in endheaders
May 01 00:23:51 xxx sh[46351]: self._send_output(message_body)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/httplib.py", line 835, in _send_output
May 01 00:23:51 xxx sh[46351]: self.send(msg)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/httplib.py", line 797, in send
May 01 00:23:51 xxx sh[46351]: self.connect()
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/httplib.py", line 778, in connect
May 01 00:23:51 xxx sh[46351]: self.timeout, self.source_address)
May 01 00:23:51 xxx sh[46351]: File "/usr/lib64/python2.7/socket.py", line 553, in create_connection
May 01 00:23:51 xxx sh[46351]: for res in getaddrinfo(host, port, 0, SOCK_STREAM):
May 01 00:23:51 xxx sh[46351]: IOError: [Errno socket error] [Errno -2] Name or service not known

Search Rate / Latency

Hi,

The keys "primaries.search.query_total" and "primaries.search.query_time_in_millis" do not have the same unit. It should not be on the same graph.

Can't get any Data

Hi,

I found your dashboard and tried to make it work, but I don't really understand what you mean there:
image

Can you please help me?

How to force refresh when switching clusters on dashboard?

First, thanks for a great dashboard!

I augmented the Python data gathering script and then rewrote it in Go (for speed and to follow internal preference for our Elasticsearch tools). We wanted to use one instance of the tool to collect data on two or more clusters.

I'm now feeding stats from 3 clusters into the Grafana ES instance. When I change the cluster by using the dropdown menu, I cannot seem to make the panels on the dashboard refresh.

field expansion matches too many fields

After added the 5th node to the cluster I get the following error in elk log file
when I try to update the dashboard.

It seems the index should be created with a valid index.query.default_field

Any suggestions ?

thanks in advance

Ale

Why?

Why did you write an exporter yourself? Elasticsearch (as well as Logstash btw) do expose all of their metrics out-of-the-box by simply adding the following to your elasticsearch.yml (or logstash.yml):

xpack.monitoring.enabled: true
xpack.monitoring.collection.enabled: true

See here for elasticsearch and here for logstash.

Afterwards you get an monitoring-es (and monitoring-logstash) index in your cluster with all of the metrics. I think this should be the way to go, because you do not need to adept your own exporter in case a new version of es or logstash comes out.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.