george3d6 / inquisitor Goto Github PK

An easily extensible, minimal footprint monitoring tool. (Still in the testing phase)

License: BSD 2-Clause "Simplified" License

Rust 72.15% CSS 10.02% HTML 2.10% JavaScript 15.72%

inquisitor's Introduction

I write hopefully-novel thoughts around machine learning, epistemology, statistics, and programming on epistem.ink.

Currently working on an automatic machine learning tool.

My other open-source projects I have are either WIP or unmaintained. Let me know if you find any of them useful, and that situation might change.

inquisitor's People

Stargazers

Watchers

Forkers

deedasmi alexmaco learnmonitoring pizzashift

inquisitor's Issues

System monitor plugin graphs should be more easily readable [Web ui]

Total memory and used memory graphs should be merged together, where total memory is used as the upper limit for the graph (0 being the lower limit)
FS space and used fs space graphs should be merged together, where total memory is used as the upper limit for the graph (0 being the lower limit)
Consider merging the network in and network out graphs together as well, since one could limit the other and it could be useful for stuff like measuring the latency of a server to a large request (or requests).

Webui graph grouping and filtering [Web ui]

Web ui should group graphs under some sort of headers and not generate them all together.

My idea atm is to:

a) Have graphs groped by machine (with only 2 or 3 randomly selected machines showing by default)
b) Add some filter button that allow you to filter by specific machines or specific keys

Pushing agent configuration updates

Would be neat to update agent configurations remotely. Blocked on #7

Comparator operator bug?

While running clippy on the receptor, it pointed out this:

Both of these blocks are the same. Is the second one supposed to be an 'in' or like comparison?

Figure out a way to protect command_runner.yml

~~Trivial code execution/privilege escalation depending on the environment. Might use https://doc.rust-lang.org/std/fs/struct.Permissions.html, haven't tested it.~~

Also, could send a hash of the config file with the results. That way the recepter should be warned any time the config was changed.

Discussion around endpoints

Should the web-ui be it's own endpoint rather than a default part of the receptor ?

Should endpoints be implemented as receptor plugins that listen for agent event and send data when they arrive ? Rather than separate entities that query the receptor for whatever data they need and send warnings based on that.

What kind of endpoints should be implemented (e.g. slack, IRC, email, twillio)... how focused and/or generic should they be ?

How much logic should be part of the endpoints (E.g. should a recpetor plugin check if value f(X) => warning and the endpoints only read said warning or should you be able to define said f in an endpoint ?)

Design the first endpoint (outside of the web ui)

Add an endpoint outside of the existing web ui for the 0.3.2 release in order to showcase it and to start thinking a bit more about how to implement endpoint in general.

Probably a slack warning one and/or and email one.

--config flag

I think both agent and receptor would benefit from a run time configuration argument. Particularly since building currently overwrites existing configs.

Ad compression layer to the agent -> receptor communication

Just what the title of the issue says, nothing heavy weight, but I'd be good to have the possibility to have a layer that can gzip/ungzip the messages sent between the two.

This could also serve as a model for adding additional layers (e.g. auth and encryption).

It's not really urgent, since message size isn't that big an issue, but it's a small improvement that wouldn't be hard to implement or affect the logic of the rest of the code much.

Maybe have it configurable in the config files of the agent, something like:

compression:
    type: string (default none, other possibilities being gzip and maybe later xzip or bzip)
    level: int (default 6)

Plugin configuration

It should be fairly simply to configure plugins based on individual .yml files, or one master .yml file that contains configuration for all plugins.

Change fs paths from &str and String to PathBuff or Path

Supporting configs for non-local plugins

The agent build script currently assumes that the included plugins are at a static path. Moving forward with allowing git paths and crates.io plugins, this doesn't work.

If plugins assume disabled upon not finding a configuration option, we could simply remove this behavior from the build script and distribute the config files as part of the tar ball.

Touches #39

Documentation

This is just a tracking issue for me to remember what documentation I need to write.

Document inquisitor_lib and requirements for plugin authors.
Document agent/plugins/cargo.toml.
Document configuration files.
Update /README.md with new compile/config workflow.
Update version numbers of all plugins

Minimum Rust Version

Do we care about maintaining some specific rust version? Or just 'will always work on stable'?

New stable (1.25) has a new use feature, but don't want to move us to it if we're looking to support some older version.

Default Plugin Error Management

This is a high level Issue to track the 'safety' of the default plugins. They should unwrap or expect a rarely as possible. Now that new() returns a result, we can greatly reduce the chance of our plugins panicking. I'll check these off as I fix them.

Plugin discussion

Right now the plugin system is in lined into the codebase via a custom build script. This has a couple disadvantages, and but only one advantage I can see.

The code isn't pretty, and is likely more error prone.
Lots of additional build artifacts, which are currently in the src folder.
Because we can't really track the plugins we load, we have to busy wait on the loop for something to be ready. This is a waste of resources.

The advantage is that no code inside the agent folder ever has to be touched in order to customize the build.

I threw together a proof of concept build here.
Only the agent is done currently. This build replaces the in lining system with a macro that writes a plugin module with all active plugins, and returns a vector of trait objects representing our active plugins. Here is my reasoning:

Pros:

Much simpler build process.
Plugins can now be entire rust modules instead of one .rs file (see the file_checker plugin)
We have a list of all plugins, which means we can ask when the next run should be and sleep the entire agent until that time.
We can effectively gitignore agent/src/plugins/mod.rs after adding the bare bones version to prevent accidental upstream.

Cons:

We only have trait objects, which means we won't be able to reference the plugin type or any attributes.
This is my first time using trait objects, so I'm kind of just hoping this won't bite us in the ass :P

Further suggestions:
Plugin::new() should return a result, which the plugin initialization macro/file can check and simply drop plugins that are not configured. This would allow us to distribute an agent binary that supports all plugins, but only runs against configured plugins.

Upgrade build system

Clean up the build.rs files (maybe move shared functions in a common place) and add back the functionality of transporting the yml config files and the service file in the build directory.

Handle error for plugin's gather method

Plugins should send errors upstream when the gather method fails. This also means we have to think about how to handle there errors, I'm thinking of making it configurable and giving a choice between:

a) Ignoring them

b) Logging them

c) Sending them to the receptor

Agent/Receptor Authentication

It might be in our best interest to authenticate agents in some way.

Merge agent, receptor and shared library

At the moment they share a lot of functionality and the 3 of them existing separately is too much mental overhead for the sake of separation.

The shared_lib is included in both of them anyway.

I can't see anything bad happening by just merging them under the same crate.

Cleanup for 0.3.1

There's a bunch of stuff that needs some cleaning up before 0.3.1

Cleanup code in the endpoints
Cleanup code in the new plugin
The addition of the "standard" config system to the receptor's and agent's main.rs
Cleaning up useless (or rarely used dependencies)
Upgrading dependency versions to the latest usable ones
Standardizing naming across config files and coming up with better defaults for them
Cleaning up code in general (this is an ongoing issue, but the codebase has grown quite a lot and I want to try and shrink it a bit before the release)
Documenting the code via comments inside the code (There's some annoying logic going on in there which I'd rather document).

Separat web_ui from receptor (as it's own endpoint, for now)

The web ui should either be an endpoint, definitely not part of the "core" of the receptor, we need to separate it.

Support long-running plugins

Currently the agent basically locks on waiting for a plugin to return. gather should probably be a threaded/asynchronous event.

This used to be in the roadmap, but no issue was created for it. Tagging 1.0, but it should be done significantly sooner than that.

Change plugin name to a static &str

Since the server kinda relies on plugin name, I'd like to suggest it be a 'static &str instead of a string. The plugin name should be known at compile time and not dependent on some runtime option.

Publish inquisitor_lib

So plugin authors can depend on it.

Probably only do this at time of release.

Semvar convention for plugin authors

Semvar is reasonably clear, but we should explicitly define what we expect from plugins.

Recommendation:

Major version number should match agent_lib or receptor_lib
Minor version should signify changes in data or data format. Renaming fields, adding new data, etc. Any change that would break a schema'd system
Patch versions should signify performance, bug fixes, data details, etc. What specific data is returned can change (trimming a log file for example) and still be a patch only bump.

Naming convention

With the idea of publishing many of our sub-crates to crates.io in order to support plugin developers and future users, we need to decide on an official naming convention for us and a recommended naming convention for other authors.

Primary concerns:

"shared_lib" is super generic and doesn't reflect what it is for.
Future plugin idea's basic name may already be taken on crates.io.
Users should be able to clearly see what plugins are agent plugins and which ones are receptor plugins by name alone.

Possible solutions:

Long form: append 'inquisitor_agent' or similar to every plugin. 'inquisitor_agent_command_runner' is unwieldy.
Short form: append "iq_agent" or similar to every plugin. "iq_receptor" is still kind of long.
Ultra-short form: "iqa_command_runner", "iqr_sync_check".

Personally, I like the short form, number 2. Following this, our libs would be "iq_agent_lib", "iq_shared_lib", and "iq_receptor_lib"

Note that this issue should only be closed when the changes are implemented, not just when the discussion is finished.

Refactor receptor plugins

I plan to move receptor plugins to a system similar to #4, but there is an additional plugin case I'd like to explore.

Currently, plugins run externally to the messages received from agents. This allows them to monitor the database in some way, but that's about it.

I'd like to also define a 'status plugin' that is sent an immutable copy of every status update from the receivers. This would allow plugins to do some parsing of messages and do additional logging, e-mail alerts, and the like.

This could be done via the ReceptorPlugin trait (default trait method that does nothing, can be overridden by plugins that want to use it), or a new plugin type. Thoughts?

Warn on semvar incompatabilities

The agent and receptor should provide compilation warnings when plugins have a lower 'major release' version than the agent/receptor itself. i.e. Agent version 1.0.0, Alive plugin version 0.9.5.

This should also apply to agent_lib, shared_lib, etc.

Default config generation

So I was thinking of how to help solve the issue of config files.

I thought of two options. One is an API change, the other isn't but is a little more unstable.

First, the unstable one:

Have our plugins catch 'file not found' errors, and create a base config that marks it as disabled, then write that config to disk in the given config directory. The benefit is that it won't be required by all plugins.

The second requires an API change:

Require a plugin::generate_config option that returns a serialized config that we write to disk. This would basically require every plugin to have serde, as well as the additional function definition. This would allow us to have a agent.exe --build-config option. This would also be another thing not documented in inquisitor_lib (but would be documented in the plugin creation guide) that plugins would have to implement.

Release 0.4 proposal

I'd like to propose that 0.4 be a 'You're welcome to make your own plugins' semi-stable release. Here is a shortlist of items that need to be addressed first:

Document agent_lib, receptor_lib, and shared_lib
Create a 'plugin creation guide' outlining all plugin requirements
Decide on a naming convention
Publish agent_lib, receptor_lib, and shared_lib to crates.io
- Switch plugins from using path-dependent dependencies to crates dependencies
Possibly pull plugins out to a separate repo.
Start following semvar

Other notes

I still want to work on #6, at least get it proofed out, which would be a breaking change.
I don't think #5 or #7 will be breaking changes.

Macros for receptor plugins

Add macros to inline construction and usage of receptor plugins (similar to what's being done in the agent)

Process counter plugin counts sub-processes, should it keep doing so ?

The process counter plugin seems to count sub-processes atm, so if someone is running something like:

julia -p 4 blah.jl

or a python program using the subprocess module it will count the umber of child processes in addition to the master process.

If there's an easy way to avoid this it should be done, since many people might use this to monitor a python/node or similar scripting language app, which uses a subprocess rather than threads of parallelism.

Agent Robustness

One of the things mentioned in a previous discussion is that plugins should not crash the agent. I'm working on implementing std::panic::catch_unwind, and run into a problem. Mutable function calls are not UnwindSafe, and can't be caught.

This means that plugin.gather() can't be put inside a catch_unwind(). I do believe that gather should be mutable though.

I can catch initialization errors with another change to plugins (https://github.com/Deedasmi/Inquisitor/blob/master/agent/plugins/src/plugin_initialization.rs#L8). Do you think that's enough? Or do you think this is unnecessary?

Error handling for plugin initialization method

Plugins should never crash the agent if initialization doesn't work, instead it should just log and error, same goes for the receptor.

Logging

Really needs some proper logging. My test environment isn't working and I'm not sure why. Do you have a preference on library?

Add receptor plugin for comparing values

The receptor needs a plugin that examines values gotten from the agent, it should take a path from where to get value[s]:

machine: any, plugin: blax, key: ['results', '[2]', 'value']

and an operation:

operator: ['<' , '211']

If the operation returns 'True' it logs a status into the database, otherwise it does nothing.

This sort of plugin could be used to easily send warnings if various values are out of whack (without having to change the endpoint's configuration for each new warning).

Make agent and receptor lib re-export shared lib

Plugins shouldn't depend on shared_lib. We can make agent_lib depend on shared_lib and re-export all the functions so that plugins can use them while only depending on agent_lib. Same with receptor_lib.