autometrics-dev / am Goto Github PK

View Code? Open in Web Editor NEW

16.0 16.0 2.0 1.07 MB

Autometrics Companion CLI app

License: Apache License 2.0

HTML 0.55% Rust 95.65% Scheme 3.12% Smarty 0.69%

am's People

Contributors

Stargazers

Watchers

Forkers

luis-sousa-pinto flenter

am's Issues

Automatically load the SLO rules into prometheus

When using am start we should also add the rules for the SLO's.

These alerting rules can be used in the following way:

Startup a alertmanager instance
Have the alertmanager monitor the prometheus instance
Once an alert happens route this to am
Expose this event in websocket connection

Explorer can then connect to this ws connection and then show something in the UI once it happens.

Alternatively we can add the monitoring of these metrics directly to am but that can still expose the event through a ws connection.

Figure out how we want to present logs from am

Currently we just use the tracing crate with the standard tracing_subscriber printing every log line to stderr.

This does come with quite some overhead such as the time and where in the code the log originated from. This can be useful when debugging the application itself, but it can be a bit overwhelming.

Do we go with the approach that we used in fp and create a different format which removes a bunch of things, such as the timestamp, call site, and fields associated with the log message, and then allow the user to pass in the --verbose flag to get the default tracing_subscriber?

Add support for Grafana

This allow am to automatically download and hook it up to the selected Prometheus instances.

It should also automatically setup the autometrics dashboard, and allow for a user to supply their own custom dashboard to be loaded automatically (this could possibly be a separate ticket).

Store prometheus config in unique location

Before am starts Prometheus, it will write the config to disk. Currently the same location is used all the time. This will be a problem if multiple am instances are started, since the configuration will be overwritten.

am should write the Prometheus config to a semi-random file in .autometrics and should be cleanup once am start finishes. If the ephemeral flag is given, then we should not store it in .autometrics but in a temporary file.

Redirect user to explorer when visiting the root

When a users visits the root of the webserver, they should be redirected to /explorer/.

Related: When a user does not specify any path with /explorer/ then we should serve index.html.

Verify checksum of downloaded archive

We currently download the archive but we do not verify it. We can download the checksum file from the GitHub release and verify it against that value. In this case it mostly protects against corrupt downloads, since we will retrieve the checksum file in the same way that we will download the archive itself.

If we wanted to protect against malicious archives being downloaded (ie. maybe a man in the middle attack), then we would need to get this checksum through some other means, such as letting the user provide it through a cli argument.

Crate hermit-abi 0.3.1 is yanked

Switch to a different version of hermit-abi to resolve this issue.

Expose am as a homebrew package

Add redirect for `/graph` to `/explorer/graph`

The idea is that people can set the prometheus URL to http://localhost:6789 and then the explorer UI would parse the query parameters there and redirect the user to the relevant functions/graph page.

Create reference docs in a markdown file

The CLI should generate a markdown file with a reference for all documented features in it. The markdown file will be loaded in the project's official documentation site: docs.autometrics.dev

Stretch goal: pass on the markdown file to the documentation repo and trigger rebuild automatically

Setup CI to do testing on each PR

Add more documentation on the push gateway in the CLI

Currently the flag --enable-pushgateway only says that it will start off a
pushgateway without explaining what it is and why you'd need it. We should add
more documentation explaining it.

Allow user to specify their own Prometheus instance

It would be nice if am detects if a Prometheus instance is already running (for instance via quickmetrics/docker) and won't download/run a local binary.

An alternative would be to run am with a --no-prometheus option that assumes that the user is already running, or connecting to a different Prometheus instance.

Periodically check if there is an update

am should periodically check if there is a new version available and guide the user into updating the binary. For installs through homebrew we should advise the user to use the homebrew command.

Rename --listen-address to --explorer-address

The flag --listen-address is ambiguous as to what exactly will be listening on
that port. We should consider renaming it to --explorer-listen-address or
--explorer-port.

Use an aggregation gateway instead of the Prometheus Pushgateway

The Pushgateway is meant for one-off batch jobs, not for long-running processes that keep updating the metrics. Metrics scraped from the Pushgateway will be incorrect when it's used with Autometrics.

We should use the prom-aggregation-gateway or the prometheus-gravel-gateway instead, both of which are built for aggregating metrics like we need for Autometrics.

Change working directory of prometheus and pushgateway

Currently the Prometheus and Pushgateway process inherit the working directory of the am binary. Prometheus will create some files there for its storage. It would be better if we move that into $PWD/.autometrics/{prometheus,pushgateway} (not sure if it needs to be scoped to version as well). That way we can also create command to clean it up.

Another things related to this would be a --ephemeral flag which will delete the data after am shutsdown.

windows compatibility

Implement proper error propagation

Currently when the prometheus install/run task fails it will not print this to the output and it looks like the application is running fine. Instead am should exit and display the error message.

Add better terminal experience

Add download bars to indicate download progress
Ask the user for certain parameters if omitted from the arguments
Ask the user for confirmation when doing potentially destructive

support `am init` command

This command would generate a am.toml file with some sensible defaults and ideally show a wizard that will ask the relevant questions;

what are the metric endpoints of your application
do you want push gateway

etc

Expose metrics of am itself

When using am start we start a webserver which can be scraped by other processes.

Optionally we should let the Prometheus instance scrape that endpoint. The reason why we probably don't want this by default is that it can be distracting and probably isn't useful when a user is working on their own application.

add updater/update message

users should get a notification if a new version is released and maybe even have the ability to upgrade with ease with just one command

Add support for Push gateway

This should download and start the push gateway. It's api should also be exposed through the webserver so that the user can manipulate their running push gateway.

If the user is using the Prometheus that am provides, then it should also add the gateway as a target (ie. the user should not have to setup any targets in this situation).

Do we enable the push gateway by default or should we make the user opt-in to be able to use it? We could enable the push gateway, if the user has not supplied any endpoints (ie. am start) and in other situations they have to use --push-gateway.

Decide if we want to do proxying for Prometheus and pushgateway

Currently we are running prometheus and pushgateway on a specific port, and we are also forwarding the content using the listen address of am + /prometheus. This requires us to specify the web-address as a config to either components. The downside is that the origin then uses the new path in the prometheus web server. Which can be confusing for a user since they need to forward their metrics to localhost:9091/pushgateway/metrics which can be confusing.

Pro's of proxying:

We are able to swap out the implementation, but explorer would just rely on the proxied endpoint
Have a single port for all components
No CORS issues when any component is running on the same port as explorer

Con's:

Rewrites the path even for the locally running component.
Need to explain that the user has to use a different path for pushing metrics.

Add discord command

Open Discord and Fiberplane server/channel from within am. The bun cli has the same option. Screenshot below

Create README.md

Add a README.md which display the following:

Add badges
Add screenshots

Add `explore` command

This command would be used by a user to open the explorer in their browser. It should also display the URL and allow for an option to not open the browser.

By default the URL that is opened should point to the local explorer giving its defaults.

The first argument to explore can be a different Prometheus endpoint. This will still use the same explorer instance but it will pass this Prometheus endpoint to explorer to use. Example: am explore https://prometheus.example.com.

We should also be able to use a different explorer endpoint, so this should be another argument. Example: am explore --explorer-endpoint https://explorer.autometrics.dev.

The defaults for the Prometheus endpoint and explorer endpoint should also come from a am configuration file for the user.

Open questions

Currently am start set the Prometheus URL as field in the html. In the scenario where we use a different explorer endpoint we have no control over the html and so we need a different way of passing the Prometheus URL to explorer. This would probably be a query-string parameter.

Publish container image containing am

We should first only push to the Docker registry. The Docker image should be a multi-arch image, containing both x86 and arm architectures.

I another PR we can support different registries such as:

AWS public registry
GitHub Package
Quay

Parse endpoint URL in am.toml using shorthand notation

Currently we allow a user to submit endpoints using various short hand forms, such as :3000. Currently this is mentioned in the docs that this is also allowed in the [[endpoint]].url property in a am.toml. Currently this is not supported as it is parsed as a regular URL.

Show explorer, prometheus, pushgateway, etc details once everything is started

We should only display the details once the download and unpack is complete for both prometheus and pushgateway.

This is not an issue when the users already has the applications cached. But it does give a poor experience for a first time user.

explicitely mention on startup that `am` is now sampling metrics from your supplied endpoint

am start :3000

Explorer endpoint: http://127.0.0.1:6789
Prometheus endpoint: http://127.0.0.1:9090

Now sampling your application at http://localhost:3000/metrics for metrics

Change default scrape frequency and allow users to customize it

When using am to monitor your local application it makes sense to get a quicker feedback loop. By changing the default scrape duration to be lower than the default we can get data from your application into Prometheus faster and you'll be able to see a better resolution.

We should also allow the user to set a default value as a command line argument, or as a value in the am.toml. Any endpoint configured in the am.toml should also allow a override from the default.

Forward call to /metrics to pushgateway if pushgateway is enabled

The call to /metrics on the am web server should proxy this call to the pushgateway /pushgateway/metrics. This payload will contain the metrics from a component that is not able to be scraped, such as a function, batch job, or a client.

Note that we cannot redirect the request since it is a POST request. Even if we were to do a 307 I'm not sure how well that is implemented in all the different languages/packages. So it is easier just to proxy it.

Related discussion: #60 (comment)

Remove duplication in downloading/unpacking of prometheus and pushgateway

PR #42 introduced a bunch of duplication 🙈 We should be able to get rid of some

Setup release process

This should include the following (either automatic or manual):

Update the version in the Crate.toml
Create a tag in git
Create a release in github
Upload the various artifacts to the github release
Update the homebrew tap repo with the new version

Show friendlier error in `am start` when scrape target is not yet up

when you am start :1337 but port 1337 has no /metrics route, then you get a warning log in the console. it'd be nice to say something like "target not yet up, but prometheus will still try to scrape it" (don't use that exact copy pls)

Move sloth file generation over from autometrics-rs repo

https://github.com/autometrics-dev/autometrics-rs/tree/main/autometrics-cli

And update docs to point to new location:

`mock` command

It would be nice if the CLI could serve an endpoint that would expose metrics that mock traffic on your service, using the actual function names from your code. It would probably want to use the same static analysis that we've discussed elsewhere to see what the function names are.

Ideally, it would also figure out the caller label, but that's harder.

Also, it might be nice if you had a visual dashboard with a bunch of knobs you could use to adjust the traffic and potentially errors for specific functions.

This, or something like it, would be one way of testing out your alerts or doing manual integration tests with them.

Make supplying the endpoints a bit smarter

We should make it as easy for a user to supply the endpoints that they want to monitor. The only thing that is required is the host, all other components have some defaults:

The protocol should only allow for http and https, where http is the default.
The port should follow the default for the protocol, 80 for http and 443 for https.
The path should default to /metrics if the path is empty. It should not be appended if a path is already there.

This should result in the following examples:

am start 127.0.0.1: would result in http://127.0.0.1:80/metrics
- All defaults
am start https://127.0.0.1: would result in https://127.0.0.1:443/metrics
- Non default protocol used
am start localhost:3030: would result in http://localhost:3030/metrics
- Non default port used
am start localhost:3030/api/metrics: would result in http://localhost:3030/api/metrics
- Non default path used, so use that instead
am start localhost:3030/api/observability: would result in http://localhost:3030/api/observability
- Non default path used, so use that instead (and do not add /metrics to it)

A user can also mix and match these: am start 127.0.0.1 https://localhost:3030/api/observability

NOTE: since we are adding the /metrics path when a path is not present in a endpoint it is not possible to have Prometheus scrape the root of a webserver. I think this is an acceptable limitation, since it makes the "normal" usage way more easy to use and users will not likely have their metrics in the / path.

Do not display the stdout/stderr from Prometheus

Displaying this information is quite verbose and not super useful, since it is in a different format which makes reading it quite confusing.

We should store this data into a buffer (maybe only collecting the last x bytes) and then expose this to the user.

The main way of retrieving this information would be through the am's api. The explorer can display this information. We can even create a live view of these logs through web-sockets or server side events.

Another way of using this data would be if an error occurred, that am would dump these logs into a file. These can then be used by the user when creating a bug report.

Finally a cli argument can be added that will make the Prometheus stdout/stderr go to into am's stdout/stderr.

Set honor_labels to true when pushgateway is enabled

The default prometheus config has this set to false, but it is more useful to have it set to true when scraping from pushgateway:

honor_labels controls how Prometheus handles conflicts between labels that are
already present in scraped data and labels that Prometheus would attach
server-side ("job" and "instance" labels, manually configured target
labels, and labels generated by service discovery implementations).

If honor_labels is set to "true", label conflicts are resolved by keeping label
values from the scraped data and ignoring the conflicting server-side labels.

If honor_labels is set to "false", label conflicts are resolved by renaming
conflicting labels in the scraped data to "exported_" (for
example "exported_instance", "exported_job") and then attaching server-side
labels.

Setting honor_labels to "true" is useful for use cases such as federation and
scraping the Pushgateway, where all labels specified in the target should be
preserved.

Note that any globally configured "external_labels" are unaffected by this
setting. In communication with external systems, they are always applied only
when a time series does not have a given label yet and are ignored otherwise.

https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config

Update default prometheus version to latest prometheus version

Check if the provided endpoints work for am

This can already be a indicator that the user has used an incorrect configuration.

When invoking am start [endpoint] we can already make a request to the provided endpoints and see if they work and contain metrics data. If they do not work we can provide a warning, but we should not stop execution of am, since the user might not have started their application.

Depending on the application to be up and running, is a bad experience. Perhaps the workflow of the user is to just turn on am at the start of their development cycle and then just leave it running.

Print out details related to explorer, prometheus, etc

Some of these options would only be shown depending on arguments that are passed into the application. So if the pushgateway is not running, then it shouldn't show this endpoint.

Some potential details that could be interesting:

Explorer endpoint
Prometheus endpoint
- Deep link to targets depending on the metric endpoints (this could be hidden behind a --verbose flag)
Pushgateway endpoint
Grafana endpoint
- Path to the local disk where the dashboards are defined

Setup review group for this repo

There is the rust-maintainers group that should be assigned by default.

Allow for only a port to be used as a metric endpoint

Passing in : plus a number would result in using http and localhost as the defaults. It is still possible to use a path after this.

So am start :3000 would become http://localhost:3000/metrics.

See #9 for more details on the previous work.

Allow defaults to be defined in a toml file

This will allow us to define all the endpoints, whether to enable pushgateway and others in the file which is going to be stored in the repository itself. This makes it easier for users that work on the same codebase to quickly get up and running with am with their local development flow.

Possible locations (note we could try them all):

.autometrics/am.toml
am.toml

Edit: Changed to toml since that is probably easier to work with for our users.

Allow user to specify the name of the job

Each target can have a name. This will be exposed as the job label in the metrics. It would be useful for the user to specify these names.

Currently we just have app_{} where the parenthesis will contain a incrementing number (to avoid collisions).

With a config file we can simply add another property, something like name. For the terminal approach, we might want to include a prefix or postfix. So something like api|:3000. The first part would be the name of the job, in this case api, and the part after the | will just be parsed as it currently is.