Giter VIP home page Giter VIP logo

srvcheck's People

Contributors

alessandropaparella avatar dakk avatar franz-ops avatar lu191 avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar

srvcheck's Issues

confGetOrDefault

Utility function for configuration inquiries.

def confGetOrDefault (conf, key, default=None):
   def iteOver(c, k):
      if len(k) == 1:
         return c[k] if k in c else default
      else:
         ke = k[0]
         k = k[1:]
         return iteOver (c[ke], k) if ke in conf else default
         
   if type(key) == str:
      key = key.split('.')
   
   return iteOver(conf, key)

And this:

self.BLOCK_WINDOW = 100 if conf["chain"]["blockWindow"] == '' else int(conf["chain"]["blockWindow"])

becomes this:

self.BLOCK_WINDOW = confGetOrDefault(conf, 'chain.blockWindow', 100)

Replace everywhere

Version check with Packaging

In order to handle all cases (such as pre-release...) and always have the correct arithmetic for comparison between versions.

Add a cache mechanism for tasks

A task should be able to maintain some persistent data. We have several ways to achieve this:

Task primitives

We introduce two primitives of Task: dataSet(k, v) and dataGet(k)

Persistent decorator

Add a decorator @persistent which automatically save and restore Task data

New service injected

Add a service injected to Task

Refactoring for generic server

Since the software is completely modular, we can theoretically use it also for servers which don't run blockchain nodes. The first thing to do is to refactor the config, so the chain part is not mandatory; I was thinking to move "name" and "service" to a new section called "general":

[general]
   name = foo
   service = foo.service
   type = chain | service

Use hashtag for notifications

We could improve information retrivial (especially when using a single channel for multiple nodes) by adding an # next to the name server in notifications.

Remove duplicated code

We should create a new sh file called common.sh which contains helper functions used in both chains. Install script should merge common.sh with the *_monitor.sh file during installation.

Auto repairable tasks

Some tasks can auto recover from problems; for instance, if the space left is low and the /var/log is full of data, the software can clean the log by itself.

  • Task.run should returns a boolean: True if it raised an alert, False otherwise
  • If run returns True, the main will call Task.canRecover
  • If it returns True, call Task.recover

As shouldCheck, canRecover also checks that the recover is called in the past with a good amount of delay specified by a parameter.

Daily server status

Every day show a status message with:

  • server name
  • substrate / cosmos
  • space left (log space used)
  • uptime

Decrease cpu overhead

During task execution, the CPU spikes to ~60%; we can mitigate cpu overhead by adding a sleep between each task execution (0.2 seconds should be enough)

TaskSystemDiskUsageAlert improvements

  • Add also the space left in GB in the notification
  • Add a new config field system.log_size_threshold (in GB)
  • Add a new config field system.disk_limit (in %)

Improve stuck notification

We need to improve stuck notifications also including how long the node is stuck, for example:

chain is stuck at block 0xe642f656991baa8a05a781810597f142b71bcf5ed3c828f852eb867f5671cbac since 12 minutes (3)

Task for auto updater

Create a new tasks that:

  1. Periodically checks if a new version of srvcheck is available
  2. If true, the pip install the new version and send a notification
  3. Restart the service

This task should be disabled by default; it could be an useful tool during development.

ChainUnreachableAlert

Add a new task that check if the node software is reachable. This task calls Chain.isReachable or uses a try/catch around a chain call

Improve modularity in config

For each class needing config variables, we need to create a new mechanism that specifies the requirements with eventually a default value.
A new command srvcheck.default_config will output a default example config.

For instance, this is a task example:

class TaskPincoPallo(Task):
    def __init__(self):
         self.requiresConf (['chain.minValue', 12], ['chain.maxValue', 17], ['chain.withoutDefault', None, 'A constant without a default value'])

And srvcheck.default_config will returns:

[chain]
; minValue = 12
; maxValue = 17
; withoutDefault ;; A constant without a default value

Cambiare il parametro local_version

Dato che il monitor per substrate usa una call rpc per ottenere la versione locale installata, rendere il paramentro local_version non più obbligatorio quando si usa il flag --git

Miglioramenti notifiche stuck

Per evitare il flooding di messaggi di stuck, potremmo fare cosi':

  • tenere una variabile booleana is_stuck che diventa true quando il nodo passa da non-stuck a stuck
  • se is_stuck e' true il messaggio di stuck metterlo ogni ora
  • se poi is_stuck diventa false, mettere un messaggio "il nodo e' tornato in sync"

Task for Ram Usage alert

Need to add a task for checking Ram usage and notify if it goes over a certain threshold like 85%

Default optional flag

Aggiungere il validator address di default tramite query e rimuoverlo come flag (non avrebbe senso di esistere), settare l'active set sempre tramite query

Parachain stuck check

Using srvcheck on parachain, it only checks blockheight for the para and not for the releay; we need to introduce a new substrate task RelayChainStuck.
Is pluggable should check if the running node has access to a parachain.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.