Giter VIP home page Giter VIP logo

support-diagnostics's People

Contributors

111andre111 avatar andyhunt66 avatar antbell avatar brunofarache avatar cleydyr avatar crispybacon avatar danielmitterdorfer avatar davecturner avatar edmocosta avatar geekpete avatar glenrsmith avatar gunnerva avatar gwbrown avatar inqueue avatar jakommo avatar jasontedor avatar jguay avatar jjfalling avatar jpcarey avatar leaf-lin avatar lucabelluccini avatar luizgpsantos avatar markwalkom avatar martijnvg avatar nemonster avatar octavioranieri-zz avatar pickypg avatar sakurai-youhei avatar stefnestor avatar tomonorisoejima avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

support-diagnostics's Issues

Remove AWS credentials from generated tarball

Every time we send result of tool run to ES support we have to remove AWS credentials from next section manually.

From cluster_state.20150219-122235.json:

"repositories" : {
"202" : {
"type" : "s3",
"settings" : {
"region" : "us-east-1",
"max_restore_bytes_per_sec" : "1mb",
"max_snapshot_bytes_per_sec" : "1mb",
"bucket" : "es-snapshot.appcelerator.prod.202",
"access_key" : "",
"secret_key" : ""
}
}
},

Please add auto-deletion of them to keep customers safe.

include filesystem type?

It would be great if the output could include the type of filesystem (e.g. ext2, nfs, whatever).

A long-term solution is to add this information (comes from java7 FileStore api) to ES stats, but as a start, maybe we could include the output from df -k + mount or similar?

Implement log sanitization

need to replace with dummy data:

  • IPs;
  • ports;
  • Hostnames;

any other info revealing sensitive information.
ideally some consistency should be kept to be able to correlate information during troubleshooting

e.g.
real data
dedicated master srv23.secret.domain 10.37.12.32 HTTP 9200 Transport 9300
data node srv24.secret.domain 10.37.12.33 HTTP 9201 Transport 9301
data node srv25.secret.domain 10.37.12.34 HTTP 9202 Transport 9302

sanitised data
dedicated master dm.x.y x.x.x.32 HTTP 1 Transport 10
data node 1 -> dn.x.y x.x.x.33 HTTP 2 Transport 11
data node 2 -> dn.x.y x.x.x.x.34 HTTP 3 Transport 12

require dependent options

When using authentication, -a is required when using -c, at least when using BASIC (untested with cookie). If -c is included, exit if -a is not provided.

Add _template

Can we also add /_template?pretty to the diagnostic dump? thx

Capture the last few days of marvel data

There are lots of times when marvel would help me help users diagnose problems. If marvel is in use for a cluster, I would like to capture at least the last 2 days of marvel data in a form that I can import into my own cluster for local analysis.

Return every thread (not just top 10 hot ones)?

When we pull diagnostics it's often useful to see not just the hot threads but any threads stuck waiting on a lock / IO operation as well, but because we only pull the top 10 "hot" ones now we won't (necessarily) see the stuck ones consuming 0% cpu.

I think we should show all threads?

Add flag to disable ?pretty

Large clusters can produce huge cluster states when using ?pretty which may not be desirable. A flag should be added to disable this.

Add _cat/shards?v

Please add /_cat/shards?v as an additional output from the tool, we are finding it useful when looking at unbalanced shard allocation. thx!

Plugin does not install on ES 2.0

There are 2 issues to address for ES 2.0 with the existing plugin:

  1. The installation step using --install no longer works in 2.0:

Change:

./bin/plugin --install elasticsearch/elasticsearch-support-diagnostics

To:

./bin/plugin install elasticsearch/elasticsearch-support-diagnostics
  1. Plugin does not install via the plugin command above because it is missing the now-required plugin-descriptor.properties

support-diagnostics.sh requires execute permission

bin/support-diagnostics$ ls -arlth
total 24K
-rw-r--r-- 1 elk elk 8.8K Sep 24 14:10 support-diagnostics.sh

Either make it executable by default OR specify in usage docs to chmod +x bin/support-diagnostics/support-diagnostics.sh

Cannot install plugin on 1.5.0 version of ES

Running the command recommend in the documentation ./bin/plugin --install elasticsearch/elasticsearch-support-diagnostics fails with the following:

-> Installing elasticsearch/elasticsearch-support-diagnostics...
Trying https://github.com/elasticsearch/elasticsearch-support-diagnostics/archive/master.zip...
Failed to install elasticsearch/elasticsearch-support-diagnostics, reason: failed to download out of all possible locations..., use --verbose to get detailed information

Using ./bin/plugin --install elastic/elasticsearch-support-diagnostics seems to work.

Side note, most of the links in the documentation in this repo all point to the old "elasticsearch/..." github repo instead of the new "elastic/..." so I'm assuming this bug an artifact of that as well.

Better Retrieval of Configuration

The current script only copies configuration files from --path.config.

However, if the user specified --config to point elasticsearch.yml to a total different location, then file is not copied. We should support --config as well.

Checksum for diagnostics

Possible to create a checksum for all files included, just to be sure that nothing has become corrupted (or manually changed) in transit.

Filter logs

As originally requested by @dadoonet :

May be we don't need to have 30 days of log files (depending on their logging settings).
Could it be possible to specify something like --days 1 to get only the last day of logs or so?

I think our default log4j settings do daily rollovers, so this certainly seems possible.

_cat/recovery written as .json file, but should be .txt

The _cat/recovery request is redirected into a JSON file in both versions of the script even though it does not return JSON. They also slightly differ in the echoed name while it happens.

support-diagnostics.sh:

echo "Getting _/recovery"
curl -XGET "$eshost/_cat/recovery?v" >> $outputdir/cat_recovery.json 2> /dev/null

support-diagnostics.ps1:

Write-Host 'Getting _cat/recovery'
Invoke-WebRequest $esHost'/_cat/recovery?v' -OutFile $outputDir/cat_recovery.json

Flip timestamp and hostname in default filename

Recommend flipping hostname and timestamp.

Flipping them would better sort directories if users reran the script by grouping by hostname, then date/time.

The difference between

  • support-diagnostics.20140929-171450.host1.tar.gz
  • support-diagnostics.20140929-171551.host2.tar.gz
  • support-diagnostics.20140929-171811.host1.tar.gz
  • support-diagnostics.20140929-172012.host2.tar.gz

and

  • support-diagnostics.host1.20140929-171450.tar.gz
  • support-diagnostics.host1.20140929-171811.tar.gz
  • support-diagnostics.host2.20140929-171551.tar.gz
  • support-diagnostics.host2.20140929-172012.tar.gz

Plus, the user should be able to sort by the file's creation time if they want the reverse.

Add _cat/shards Logstash Configuration

As some of the output gets large, I have started to find it can be convenient to send it right back into a local instance of the ELK stack to analyze.

In my case, I found it convenient to look at _cat/shards to analyze where all the space was going.

My intent here is that people can run the support scripts, and then use ELK to analyze the results on their own using configurations. These can be added as they come up.

include _count output?

Today we can see the number of lucene documents, but when nested documents are in place, it would be really useful to know the number of "real" user-level documents.

No error if authentication fails

If Shield is enabled on the cluster and the user forgets to use the parameters -c and -p (username/password) the support-diagnostics completes without errors, but every file contains only:

{
  "error" : "AuthenticationException[missing authentication token for REST request [/?pretty]]",
  "status" : 401
}

Need to add testing to ensure that authentication is used.

Cheers,
-Robin-

Complaints in logs about missing _site directory

After installing this plugin, we've started to see log lines like this:

[2015-06-25 22:06:14,292][DEBUG][plugins                  ] [c1b-searchb3-prod] [/opt/elasticsearch/plugins/support-diagnostics/_site] directory does not exist.
[2015-06-25 22:06:24,311][DEBUG][plugins                  ] [c1b-searchb3-prod] [/opt/elasticsearch/plugins/support-diagnostics/_site] directory does not exist.
[2015-06-25 22:06:34,346][DEBUG][plugins                  ] [c1b-searchb3-prod] [/opt/elasticsearch/plugins/support-diagnostics/_site] directory does not exist.
[2015-06-25 22:06:44,365][DEBUG][plugins                  ] [c1b-searchb3-prod] [/opt/elasticsearch/plugins/support-diagnostics/_site] directory does not exist.
[2015-06-25 22:06:54,382][DEBUG][plugins                  ] [c1b-searchb3-prod] [/opt/elasticsearch/plugins/support-diagnostics/_site] directory does not exist.
[2015-06-25 22:07:04,400][DEBUG][plugins                  ] [c1b-searchb3-prod] [/opt/elasticsearch/plugins/support-diagnostics/_site] directory does not exist.

Which looks like a bug to me. We install a bunch of other plugins and haven't seen this issue before. From what I can tell its harmless but annoying.

Please let me know if this is an issue with our particular install - if not, can it be fixed?

include pidstat on linux

the current top is just a 1 second snapshot in time.

On linux can we get the pid file and get global process statistics for the ES process? Something like pidstat -druvw -p

This gives a global summary of things like page faults/second, io rate/second, context switches/second as well as some things top doesnt show like number of file descriptors and threads.

Remove rmdir $outputdir line or need some additional checks to prevent accidental deletion of important directories

While this (https://github.com/elasticsearch/elasticsearch-support-diagnostics/blob/master/bin/support-diagnostics.sh#L80) is a convenience feature (for removing the outputdir if host is not reachable), it is quite dangerous esp if the script is run with sudo or root privileges. Had a scenario in the field where -H is specified but not reachable and the user happened to have -o set to the directory where ES is installed, as a result, that directory is removed and the ES installation is gone ...

#ensure we can connect to the host, or exit as there is nothing more we can do
connectionTest=`curl -s -S -XGET $eshost 2>&1`
if [ $? -ne 0 ]
then
    echo "Error connecting to $eshost: $connectionTest"
    rmdir $outputdir
    exit 1
fi

Missing features in 2.0 branch

Some features still need to be added to the 2.0 branch:

  • Multi run support
  • Authentication support
  • Top
  • Netstat
  • Also need to ensure it runs on both Java 7 and 8

Update available notification

It would be nice if this tool could display a message when a new version is available when the user runs the plugin. This way they don't need to check github.

Add a timeout on curl commands

We should add the --max-time curl parameter and fail gracefully when any of the curl commands takes a long time. This would mean we potentially lose some output at the sake of completing faster, which is useful in time-sensitive, critical situations.

Capture cluster uuid

Cluster uuid is not exported by default. It would be helpful to link data from other sources:

GET _cluster/stats?output_uuid=true

Allow fetching of all nodes automatically

I feel like reconciling the results could be pretty confusing if users need to run it against more than one node.

Granted, it's just a matter of opening it up, but that could get annoying pretty quickly. Long term, I suspect that we could automate the retrieval of all nodes by pre-fetching all of their names.

If diagnostics fails, fail with an error

If I run diagnostics.sh with incorrect parameters (e.g. invalid host), it completes with something like:

Using /usr/bin/java as Java Runtime
Using -Xms256m -Xmx2000m  for options.
Prompt for a password?  No password value required, only the option. Hidden from the command line on entry.: 
Getting Network Interface Information - this may take some time...
Run 1 of 1 completed.

No file created, no error posted... it looks as if it did something, but nothing happened. This confused me, and will surely confuse others.

Add cluster/health and pending_tasks

I find these to be handy, can we add these to the tool? :) thx

  • _cluster/health?pretty (quick view of the status of the cluster with aggregated #s on unassigned, initializing shards,etc..)
  • _cluster/pending_tasks?pretty (to see if there are any slow, queued, stuck tasks potentially).

Multiple runs - one file per run

Some files are created at beginning of multiple runs (version.json) and others created for each run.
Could we have one file per run which in itself is complete? This would make incremental processing easier as each diagnostic would be complete (and the unlikely event that one of the common files had changed between runs would be removed).

AWS/Cloud provider key masking

Related to #30 but higher priority. Things like keys for aws can appear in yml and cluster state output. Would be nice to mask them automatically before packaging.

Diagnostics plugin should also pull segments API?

It would be helpful if we could see the segments API output when we pull diagnostics, e.g. this would let us see which Lucene version wrote which segments in each index.

    curl -XGET 'http://localhost:9200/_segments'

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.