Giter VIP home page Giter VIP logo

watch-and-recover's Introduction

At a glance

watch-and-recover.py is a simple script to supervise a bunch of processes and restart them if encountered dead.

Unlike other solutions, processes can be grouped to create dependencies between them. The recovery of a process might stop and restart other processes which are under the control of watch-and-recover too.

For example, your app consists of several sub-processes which are all supervised. In case one of the sub-processes is not running a recovery script must be triggered which restarts the whole application shutting down all the sub-processes to restart them.

The script is intended to be run from cron by the user who is allowed to execute the recovery action.

The script has very few dependencies so it can be used on Linux and AIX.

Installation

Install the script in some directory and make it executable

curl -L -s https://github.com/thorstenkramm/watch-and-recover/raw/master/watch-and-recover.py > /usr/local/bin/watch-and-recover
chmod +x /usr/local/bin/watch-and-recover

Create a configuration using the example and adjust it to your needs.

Monitoring with Zabbix

The script can report the status of all supervised processes and the status of the recovery actions to a Zabbix-Server using a local installed Zabbix-Sender. In order to connect the script to a Zabbix-Server you must import the appropriated template and assign it to the host the script runs on. zabbix_sender is invoked using the configuration from Zabbix-Agent. Zabbix-Agent Active mode should be configured successfully before.

The usage of Zabbix-Sender is optional. If you remove the zabbix_sender_bin option from the config file, the script works well without sending any data to a Zabbix-Server.

Usage

watch-and-recover -h prints a brief help message.

usage: watch-and-recover.py [-h] [-v] [-pd] [-pj] [-pg] [-c CONFIG]

Watch and recover processes

optional arguments:
  -h, --help            show this help message and exit
  -v, --verbose         increases log verbosity for each occurrence.
  -pd, --print-discovery
                        Dump the discovery json object to the console. Forces
                        sending of the discovery too.
  -pj, --print-jobs     Dump the list of watch jobs
  -pg, --print-groups   Dump the list of groups
  -c CONFIG, --config CONFIG
                        Location of config. If not given ~/.watch-and-
                        recover.cfg is taken.

If the config is placed in ~/.watch-and-recover.cfg the script can be invoked without any parameters.

Example

Look at the following snipped for a config file

[watch:app]
watch_for = java -jar app.jar
recover_with = recovery.sh full
group = my_app

[watch:sub1]
watch_for = java -jar app.jar -Dproperty.foo=sub1
recover_with = recovery.sh sub1
group = my_app

[watch:sub2]
watch_for = java -jar app.jar -Dproperty.foo=sub2
recover_with = recovery.sh sub2
group = my_app

[group:my_app]
delay = 300
tries = 3
cwd = /opt/my_app/scripts/

The jobs are processed in order of appearance. On the command line ps -ef is executed to get the list of running processes. The watch_for string is taken as a regular expression and matched against the CMD-column of the ps command.

If no match is found for java -jar app.jar a cd /opt/my_app/scripts/ and nohup ./recovery.sh full& is executed. Now the clock runs for 300 seconds. During this period all recovery actions of the same group are ignored. If java -jar app.jar or any other process of the group cannot be recovered two more tries are executed. If the three tries have been reached, no more actions are taken for any process of the group. The clock and the tries-counter are reset if all processes of the group are running again.

watch-and-recover's People

Contributors

thorstenkramm avatar

Stargazers

 avatar

Watchers

James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.