Giter VIP home page Giter VIP logo

mongodbatlas_exporter's Issues

Instrument the Atlas HTTP Client RoundTripper

I've added a special Error type HTTPError in #14. This was outside the scope of the PR and therefore I didn't not focus on doing this correctly. Sometimes doing things the inefficiently helps us learn to do things more efficiently.

Instead of handling/counting each error code whenever they occur in the code it is much better practice to wrap the RoundTripper for an HTTPClient. This wrapping acts as a sort of middleware from which we can instrument all HTTP calls at a single point.

Specifically we need to wrap the round tripper with InstrumentRoundTripperCounter. It will partition the CounterVec by method and code. This is exactly what we want.

The prometheus team has provided an example in their tests:
https://github.com/prometheus/client_golang/blob/0400fc44d42dd0bca7fb16e87ea0313bb2eb8c53/prometheus/promhttp/instrument_client_test.go#L204-L210

Checked VS. Unchecked Exporter

Summary

  1. Register a collector per instance/disk.
  2. Goroutine that listens for new instances/disks to register.
  3. Methods to Unregister and Reregister instances/disks as labels change?

Exploration

Due to the dynamic nature of this exporter we may want to investigate using an Unchecked exporter. One which returns no description of metrics. We really need a better understanding of the downsides of using an Unchecked exporter OR if we can find a better method to manage collectors that allows the collectors to remain checked.

The github.com/prometheus/client_golang/prometheus documentation indicates that our use case is expected and describes the pitfalls of checked vs. unchecked.

There is a more involved use case, too: If you already have metrics available, created outside of the Prometheus context, you don't need the interface of the various Metric types. You essentially want to mirror the existing numbers into Prometheus Metrics during collection.

Creation of the Metric instance happens in the Collect method. The Describe method has to return separate Desc instances, representative of the “throw-away” metrics to be created later. NewDesc comes in handy to create those Desc instances. Alternatively, you could return no Desc at all, which will mark the Collector “unchecked”. No checks are performed at registration time, but metric consistency will still be ensured at scrape time, i.e. any inconsistencies will lead to scrape errors. Thus, with unchecked Collectors, the responsibility to not collect metrics that lead to inconsistencies in the total scrape result lies with the implementer of the Collector. While this is not a desirable state, it is sometimes necessary. The typical use case is a situation where the exact metrics to be returned by a Collector cannot be predicted at registration time, but the implementer has sufficient knowledge of the whole system to guarantee metric consistency.

The big question now is what do they mean by "metric consistency"? I think I have seen an example of this inconsistency with the stackdriver exporter. It reported duplicate versions of a metric which caused a panic and crash.

I think though that the stackdriver exporter sets a good example for how we can keep our exporters checked. Currently we only register 2 collectors at startup. One for disks, and one for processes.

The stack driver exporter creates Collectors per project you ask it to scrape metrics from: https://github.com/prometheus-community/stackdriver_exporter/blob/16401d6cce781e5d99615e9518f220dbf56d6f0b/stackdriver_exporter.go#L124-L131

So what we would want is to create a collector per instance/disk, register with the metrics found when the collector starts, and success!

If an instance restarts, many of its labels and metrics will change. So we should unregister the collector and register a new one if the labels change.

Sometimes, some metrics are missing

We sometimes rely on mongodbatlas_processes_stats_connections to list clusters in our dashboards, but we've noticed that this metric occasionally does not exist for processes.

Connection Reset by Peer

We recently observed " Connection Reset by Peer" when we are connecting to MongoDB Atlas instance , it runs for sometime and then the exporter fails with the below issues . We need restart of the exporter in case of connection failure to be automatically handled by the exporter

{"err":"Get "https://cloud.mongodb.com/api/atlas/v1.0/groups//processes": read tcp :47368->:443: read: connection reset by peer","level":"error","msg":"failed to list processes of the project","project":","timestamp":"2021-12-02T02:10:40.60864571Z"}
panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x0 pc=0x7c785e]

goroutine 22 [running]:
mongodbatlas_exporter/mongodbatlas.(*AtlasClient).ListProcesses(0xc0001a9080)
/go/mongodbatlas_exporter/mongodbatlas/mongodbatlas.go:69 +0x31e
mongodbatlas_exporter/registerer.(*ProcessRegisterer).registerAtlasProcesses(0xc000260990)
/go/mongodbatlas_exporter/registerer/process_registerer.go:51 +0x42
mongodbatlas_exporter/registerer.(*ProcessRegisterer).Observe(0xc000260990)
/go/mongodbatlas_exporter/registerer/process_registerer.go:43 +0x1e
created by main.main
/go/mongodbatlas_exporter/main.go:56 +0x434

**When we are trying to restart the POD could not able to start again and the POD goes down with the above error **

Please have look at it and let us know if you need any additional information

List of metrics?

Can you add a list of the metrics currently exported by this?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.