commercetools / mongodbatlas_exporter Goto Github PK

View Code? Open in Web Editor NEW

13.0 45.0 7.0 219 KB

MongoDB Atlas exporter for Prometheus

License: MIT License

Makefile 0.94% Go 98.50% Dockerfile 0.56%

prometheus prometheus-exporter mongodb mongodb-atlas mongodbatlas prometheus-metrics

mongodbatlas_exporter's Introduction

Mongodbatlas exporter for Prometheus

Archived: NO LONGER MAINTAINED

Atlas released a prometheus integration that provides much more reliable metrics without having to circumvent API rate limits.

See Introducing MongoDB’s Prometheus Monitoring Integration.

Limitations

Exporter supports up to 30 processes (mongod and mongos)

number of process calculation:
1 sharded cluster with 3 shards = 3x3 shards mongod processes + 3x3 mongos processes + 3x1 config mongod processes = 21
1 non-sharded cluster (replica set) = 3x1 mongod processes

Minimal scrape interval should be 1m

Configuration

mongodbatlas_exporter doesn't require any configuration file and the available flags can be found as below:

usage: mongodbatlas_exporter [<flags>]

Flags:
  --help                    Show context-sensitive help (also try --help-long and --help-man).
  --listen-address=":9905"  The address to listen on for HTTP requests.
  --atlas.public-key=ATLAS.PUBLIC-KEY
                            Atlas API public key
  --atlas.private-key=ATLAS.PRIVATE-KEY
                            Atlas API private key
  --atlas.project-id=ATLAS.PROJECT-ID
                            Atlas project id (group id) to scrape metrics from
  --atlas.cluster=ATLAS.CLUSTER ...
                            Atlas cluster name to scrape metrics from. Can be defined multiple times. If not defined all clusters in the project will be scraped
  --log-level=debug         Printed logs level.
  --version                 Show application version.

mongodbatlas_exporter's People

Contributors

Stargazers

Watchers

Forkers

tetianakravchenko 303563041 devopsdisasters preetika15 shmul vaizramez andreilapkin

mongodbatlas_exporter's Issues

Wrap the Atlas Client with a Backoff + Jitter Round Tripper

Atlas API request limits are fairly low considering the number of instances that may need to be pulled.

Introducing backoff with jitter may help more requests succeed.

A similar approach as to that of #16 should be followed.

Sometimes, some metrics are missing

We sometimes rely on mongodbatlas_processes_stats_connections to list clusters in our dashboards, but we've noticed that this metric occasionally does not exist for processes.

MONGOS instances have disks, but never return datapoints

This is one to ask MongoDB Atlas support about. Making this issue so I can use it as a comment in the code.

List of metrics?

Can you add a list of the metrics currently exported by this?

We recently observed " Connection Reset by Peer" when we are connecting to MongoDB Atlas instance , it runs for sometime and then the exporter fails with the below issues . We need restart of the exporter in case of connection failure to be automatically handled by the exporter

{"err":"Get "https://cloud.mongodb.com/api/atlas/v1.0/groups//processes": read tcp :47368->:443: read: connection reset by peer","level":"error","msg":"failed to list processes of the project","project":","timestamp":"2021-12-02T02:10:40.60864571Z"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7c785e]

goroutine 22 [running]:
mongodbatlas_exporter/mongodbatlas.(*AtlasClient).ListProcesses(0xc0001a9080)
/go/mongodbatlas_exporter/mongodbatlas/mongodbatlas.go:69 +0x31e
mongodbatlas_exporter/registerer.(*ProcessRegisterer).registerAtlasProcesses(0xc000260990)
/go/mongodbatlas_exporter/registerer/process_registerer.go:51 +0x42
mongodbatlas_exporter/registerer.(*ProcessRegisterer).Observe(0xc000260990)
/go/mongodbatlas_exporter/registerer/process_registerer.go:43 +0x1e
created by main.main
/go/mongodbatlas_exporter/main.go:56 +0x434

**When we are trying to restart the POD could not able to start again and the POD goes down with the above error **

Please have look at it and let us know if you need any additional information

Only Scrape FTS if FTS is enabled

Just leaving some info here on how the API works for FTS: https://docs.atlas.mongodb.com/reference/atlas-search/tutorial/create-index-api/#std-label-fts-tutorial-index-creation-api

Would need to check the clusters to see if they have FTS indices defined. Not something done at the process level which almost everything else is.

Instrument the Atlas HTTP Client RoundTripper

I've added a special Error type HTTPError in #14. This was outside the scope of the PR and therefore I didn't not focus on doing this correctly. Sometimes doing things the inefficiently helps us learn to do things more efficiently.

Instead of handling/counting each error code whenever they occur in the code it is much better practice to wrap the RoundTripper for an HTTPClient. This wrapping acts as a sort of middleware from which we can instrument all HTTP calls at a single point.

Specifically we need to wrap the round tripper with InstrumentRoundTripperCounter. It will partition the CounterVec by method and code. This is exactly what we want.

The prometheus team has provided an example in their tests:
https://github.com/prometheus/client_golang/blob/0400fc44d42dd0bca7fb16e87ea0313bb2eb8c53/prometheus/promhttp/instrument_client_test.go#L204-L210

Checked VS. Unchecked Exporter

Summary

Register a collector per instance/disk.
Goroutine that listens for new instances/disks to register.
Methods to Unregister and Reregister instances/disks as labels change?

Exploration

Due to the dynamic nature of this exporter we may want to investigate using an Unchecked exporter. One which returns no description of metrics. We really need a better understanding of the downsides of using an Unchecked exporter OR if we can find a better method to manage collectors that allows the collectors to remain checked.

The github.com/prometheus/client_golang/prometheus documentation indicates that our use case is expected and describes the pitfalls of checked vs. unchecked.

There is a more involved use case, too: If you already have metrics available, created outside of the Prometheus context, you don't need the interface of the various Metric types. You essentially want to mirror the existing numbers into Prometheus Metrics during collection.

Creation of the Metric instance happens in the Collect method. The Describe method has to return separate Desc instances, representative of the “throw-away” metrics to be created later. NewDesc comes in handy to create those Desc instances. Alternatively, you could return no Desc at all, which will mark the Collector “unchecked”. No checks are performed at registration time, but metric consistency will still be ensured at scrape time, i.e. any inconsistencies will lead to scrape errors. Thus, with unchecked Collectors, the responsibility to not collect metrics that lead to inconsistencies in the total scrape result lies with the implementer of the Collector. While this is not a desirable state, it is sometimes necessary. The typical use case is a situation where the exact metrics to be returned by a Collector cannot be predicted at registration time, but the implementer has sufficient knowledge of the whole system to guarantee metric consistency.

The big question now is what do they mean by "metric consistency"? I think I have seen an example of this inconsistency with the stackdriver exporter. It reported duplicate versions of a metric which caused a panic and crash.

I think though that the stackdriver exporter sets a good example for how we can keep our exporters checked. Currently we only register 2 collectors at startup. One for disks, and one for processes.

The stack driver exporter creates Collectors per project you ask it to scrape metrics from: https://github.com/prometheus-community/stackdriver_exporter/blob/16401d6cce781e5d99615e9518f220dbf56d6f0b/stackdriver_exporter.go#L124-L131

So what we would want is to create a collector per instance/disk, register with the metrics found when the collector starts, and success!

If an instance restarts, many of its labels and metrics will change. So we should unregister the collector and register a new one if the labels change.