Giter VIP home page Giter VIP logo

systemd_mon's People

Contributors

faburem avatar joonty avatar tylerjl avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

systemd_mon's Issues

crash

Mär 09 10:55:47 gna.vfn-nrw.de systemd[1]: systemd_mon.service: Failed with result 'exit-code'.
Mär 09 10:55:47 gna.vfn-nrw.de audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd_mon comm="systemd" exe="/usr/lib/systemd/systemd" 
Mär 09 10:55:47 gna.vfn-nrw.de systemd[1]: systemd_mon.service: Unit entered failed state.
Mär 09 10:55:47 gna.vfn-nrw.de systemd[1]: systemd_mon.service: Main process exited, code=exited, status=1/FAILURE
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from <internal:abrt_prelude>:2:in `<compiled>'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:39:in `require'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems/core_ext/kernel_require.rb:126:in `rescue in require'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems.rb:187:in `try_activate'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems/specification.rb:936:in `find_inactive_by_path'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems/specification.rb:748:in `stubs'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems/specification.rb:870:in `dirs'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems.rb:355:in `path'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems.rb:332:in `paths'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems.rb:332:in `new'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: from /usr/share/rubygems/rubygems/path_support.rb:34:in `initialize'
Mär 09 10:55:47 gna.vfn-nrw.de systemd_mon[14715]: /usr/share/rubygems/rubygems/path_support.rb:71:in `path=': undefined method `+' for nil:NilClass (NoMethodError)
Mär 09 10:55:47 gna.vfn-nrw.de systemd[1]: Starting SystemdMon...

Uncaught exception

At start systemd_mon i get this messages in log

systemd[1]: Starting SystemdMon...
systemd_mon[21178]: /usr/local/rvm/gems/ruby-2.4.1/gems/ruby-dbus-0.11.2/lib/dbus/marshall.rb:301: warning: constant ::Fixnum is deprecated
systemd_mon[21178]: /usr/local/rvm/gems/ruby-2.4.1/gems/ruby-dbus-0.11.2/lib/dbus/message.rb:129: warning: constant ::Fixnum is deprecated
systemd_mon[21178]: /usr/local/rvm/gems/ruby-2.4.1/gems/ruby-dbus-0.11.2/lib/dbus/message.rb:129: warning: constant ::Fixnum is deprecated
systemd_mon[21178]: /usr/local/rvm/gems/ruby-2.4.1/gems/ruby-dbus-0.11.2/lib/dbus/marshall.rb:301: warning: constant ::Fixnum is deprecated
systemd_mon[21178]: Uncaught exception (NoMethodError) in callback: undefined method `first' for #<SystemdMon::StateValue:0x000000027da8a0>
systemd_mon[21178]: Uncaught exception (NoMethodError) in callback: undefined method `first' for #<SystemdMon::StateValue:0x00000002d15590>

last two strings - i think for two monitored services

Wildcard service names

A question/feature request: is it possible to specify a "*" as a service name to get notifications in case any service fails?

Environment variables

Is there a way to reference environment variables in the YAML configuration, or override the YAML configuration using environment variables?

Support oneshot services (inactive vs. failure services)

Hey @joonty, I've been digging around in the code again and wanted to propose extending functionality to address use cases that include oneshot services (a github issue seems like the best place for this discussion, let me know if it belongs elsewhere.)

To illustrate, the use case I'm thinking of would be a cron replacement using timer units and oneshot service units. Each time a timer triggers its accompanying service unit, as long as systemd doesn't see a nonzero exit code, the service unit does not go into a failure state but remains inactive (dead) (after briefly transitioning to activating.) However, for a oneshot service that does not RemainAfterExit, this is basically still a "good" state to be in - it just means the last service execution was a success (nonzero exit.)

Currently, when trying to use systemd_mon with oneshot services, I get errors about calls to first in state_change.rb.

The current state paradigm indicates that inactive is a bad state, but for oneshot services, this could actually be ok. The "important" states in the case of a oneshot would be whether ActiveState is inactive or failed.

I think the easiest way to implement this may be to pass the Type of a service unit to the State constructor and, if oneshot, alter the possible ok_states and failure_states that ActiveState can take on (I think Type is a gettable property from the unit's dbus handle.)

Does this sound like a reasonable + good idea? I'm asking before coding it up because my ruby is kind of rusty and want to confirm this would work for the way you've got the state change algorithm set up; I don't grok it fully. 😄 If so I'll throw a PR together, the aforementioned implementation doesn't seem hard, just needs to fit correctly into the current paradigm you've got going.

[RFE] Only alert on failed services

Hello

thank you for this software, it is really useful.
A feature I'd like to have on it is to get alerts/messages only in specific states, like failed, instead of knowing every time a service is reloaded/restarted (like for example after rotating its log files).

thanks again, I was looked for something like systemd_mon for a long time

memory leaks?

I've been using systemd_mon for a little while, and I love it for its Slack integration. (Sometimes I'll just restart services just to see those little coloured messages pop up in our channel :) ) I noticed that, over a long weekend (3.5 days or so) it managed to go from 4% of memory consumption to 20% (on a 768MB VPS). I'm using both e-mail and the aforementioned Slack notifications.

I can help with debugging if you like, but I'll forewarn you that I'm by no means a Ruby pro.

Ruby gem is missing hipchat.rb

Thanks for an awesome gem. I'm pretty new to systemd and just got some watchdog monitoring setup, so I was glad to find this gem to keep tabs on service changes without constantly scanning logs.

Anyway, just wanted to report that version 0.1.0 installed via "gem install" is missing hipchat.rb

I see now that hipchat.rb was actually added after 0.1.0 so probably just need a new version tag and update the gem.

oneshot services incorrectly reported as `still failed`

i see that oneshot services are supported:
#3
but it's not working well for me.

for example, i have the certbot.service unit added to the config.
this is a oneshot service running on a timer to manage ssl cert renewal.

$ sudo systemctl status certbot.service
● certbot.service - Certbot
   Loaded: loaded (/lib/systemd/system/certbot.service; static; vendor preset: enabled)
   Active: inactive (dead) since Mon 2018-02-12 12:45:13 CST; 50s ago
     Docs: file:///usr/share/doc/python-certbot-doc/html/index.html
           https://letsencrypt.readthedocs.io/en/latest/
  Process: 17261 ExecStart=/usr/bin/certbot -q renew (code=exited, status=0/SUCCESS)
 Main PID: 17261 (code=exited, status=0/SUCCESS)

Feb 12 12:45:12 example.com systemd[1]: Starting Certbot...
Feb 12 12:45:13 example.com systemd[1]: Started Certbot.
$ sudo systemctl --failed
0 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

this service is running fine. exited without error.

but using slack-notifier, i'm getting this every time certbot.service runs:

Alert: systemd unit certbot.service on example.com still failed
Hostname
example.com
Unit
certbot.service
Active
inactive
Status
dead

possible false positives on oneshot type services without 'RemainAfterExit=yes'

First, a word of gratitude for this systemd monitoring app. In all honesty, I was using https://github.com/gkarakou/systemd-denotify for quite a while on desktops, but recently I was looking for something more geared towards servers when stumbling onto this project. Generally works great for my use cases.

Only one issue so far: when setting up a systemd unit of type 'oneshot' that doesn't have 'RemainAfterExit=yes' (like the default logrotate.service on archlinux) I see some errors and systemd_mon (erroneously) notifies via email. No clue if this is expected behaviour or a bug, my systemd knowledge is far from 'developed'.. It could be related to the recently integrated pull request supporting oneshot type services (#3), but as that happened before I started to use systemd_mon this is something I cannot judge.

What happens on the logrotate.service unit:

(1) without RemainAfterExit=yes --> Active: inactive (dead)
      ==> systemd_mon starts notifying (repeatedly)
      unexpected: systemctl --failed --all doesn't report anything for logrotate.service

(2) with RemainAfterExit=yes --> Active: active (exited)
      ==> systemd_mon doesn't notify
     expected

I have worked around this issue by adding /etc/systemd/system/logrotate.service, which differs only in the 'RemainAfterExit=yes' part. Yet it might prove useful to report this issue, hope it doesn't cause too much confusion :-)

Some debug info on the issue:

$ cat /usr/lib/systemd/system/logrotate.service
[Unit]
Description=Rotate log files

[Service]
Type=oneshot
ExecStart=/usr/bin/logrotate /etc/logrotate.conf
Nice=19
IOSchedulingClass=best-effort
IOSchedulingPriority=7

$ sudo systemctl --failed
0 loaded units listed. Pass --all to see loaded but inactive units, too.
To show all installed unit files use 'systemctl list-unit-files'.

$ sudo systemctl status -l logrotate.service
● logrotate.service - Rotate log files
Loaded: loaded (/usr/lib/systemd/system/logrotate.service; static; vendor preset: disabled)
Active: inactive (dead) since Mon 2015-12-07 21:39:04 UTC; 41min ago
Main PID: 28010 (code=exited, status=0/SUCCESS)

Dec 07 21:39:04 do16 systemd[1]: Starting Rotate log files...
Dec 07 21:39:04 do16 systemd[1]: Started Rotate log files.

$ systemd_mon ~/.systemd_mon_testing.yml
SystemdMon::Notifiers::Email doesn't respond to 'notify_start!', not sending notification
Monitoring changes to 13 units

Using notifiers: SystemdMon::Notifiers::Email

SystemdMon::State:0x00000002e70228

SystemdMon::State:0x000000027a8300

SystemdMon::State:0x00000001dcf360

SystemdMon::State:0x00000001ca02f0

SystemdMon::State:0x00000001330918

logrotate.service failed: inactive (dead)
Uncaught exception (NoMethodError) in callback: undefined method `first' for #SystemdMon::StateValue:0x00000001329d48

SystemdMon::State:0x00000001d5ab00

SystemdMon::State:0x00000001e866f0

SystemdMon::State:0x000000027e11a0

SystemdMon::State:0x00000002920ed0

SystemdMon::State:0x00000002a9d808

SystemdMon::State:0x00000002b8ac20

SystemdMon::State:0x00000002c60028

SystemdMon::State:0x00000002d21f98 [*]

logrotate.service still failed: inactive (dead)
active state changed from inactive to inactive then activating then inactive

Notifying state change of logrotate.service via SystemdMon::Notifiers::Email
SystemdMon::Notifiers::Email: Sending email to [email protected]:
SystemdMon::Notifiers::Email: -> Subject: "Alert: logrotate.service on do16: still failed"
SystemdMon::Notifiers::Email: -> Message: "Systemd unit logrotate.service on do16 still failed: inactive (dead)


| Time | Active |


| 22:29:48.815 +0000 | inactive |


| 22:30:14.110 +0000 | inactive |


| 22:30:14.114 +0000 | activating |


| 22:31:17.794 +0000 | inactive |


Regards, SystemdMon"
SystemdMon::Notifiers::Email: sent email notification

SystemdMon::State:0x00000002d1b1e8

[*] running commands in another terminal window
$ sudo systemctl start logrotate.service
$ sudo systemctl status -l logrotate.service
● logrotate.service - Rotate log files
Loaded: loaded (/usr/lib/systemd/system/logrotate.service; static; vendor preset: disabled)
Active: inactive (dead) since Mon 2015-12-07 22:25:20 UTC; 9s ago
Process: 28278 ExecStart=/usr/bin/logrotate /etc/logrotate.conf (code=exited, status=0/SUCCESS)
Main PID: 28278 (code=exited, status=0/SUCCESS)

Dec 07 22:25:20 do16 systemd[1]: Starting Rotate log files...
Dec 07 22:25:20 do16 systemd[1]: Started Rotate log files.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.