Giter VIP home page Giter VIP logo

go-grpc-prometheus's Introduction

(Deprecated) Go gRPC Interceptors for Prometheus monitoring

Go Report Card GoDoc SourceGraph codecov Slack Apache 2.0 License

โš ๏ธ This project is depreacted and archived as the functionality moved to go-grpc-middleware repo since provider/[email protected] release. You can pull it using go get github.com/grpc-ecosystem/go-grpc-middleware/providers/prometheus. The API is simplified and morernized, yet functionality is similar to what v1.2.0 offered. All questions and issues you can submit here.

Prometheus monitoring for your gRPC Go servers and clients.

A sister implementation for gRPC Java (same metrics, same semantics) is in grpc-ecosystem/java-grpc-prometheus.

Interceptors

gRPC Go recently acquired support for Interceptors, i.e. middleware that is executed by a gRPC Server before the request is passed onto the user's application logic. It is a perfect way to implement common patterns: auth, logging and... monitoring.

To use Interceptors in chains, please see go-grpc-middleware.

This library requires Go 1.9 or later.

Usage

There are two types of interceptors: client-side and server-side. This package provides monitoring Interceptors for both.

Server-side

import "github.com/grpc-ecosystem/go-grpc-prometheus"
...
    // Initialize your gRPC server's interceptor.
    myServer := grpc.NewServer(
        grpc.StreamInterceptor(grpc_prometheus.StreamServerInterceptor),
        grpc.UnaryInterceptor(grpc_prometheus.UnaryServerInterceptor),
    )
    // Register your gRPC service implementations.
    myservice.RegisterMyServiceServer(s.server, &myServiceImpl{})
    // After all your registrations, make sure all of the Prometheus metrics are initialized.
    grpc_prometheus.Register(myServer)
    // Register Prometheus metrics handler.    
    http.Handle("/metrics", promhttp.Handler())
...

Client-side

import "github.com/grpc-ecosystem/go-grpc-prometheus"
...
   clientConn, err = grpc.Dial(
       address,
		   grpc.WithUnaryInterceptor(grpc_prometheus.UnaryClientInterceptor),
		   grpc.WithStreamInterceptor(grpc_prometheus.StreamClientInterceptor)
   )
   client = pb_testproto.NewTestServiceClient(clientConn)
   resp, err := client.PingEmpty(s.ctx, &myservice.Request{Msg: "hello"})
...

Metrics

Labels

All server-side metrics start with grpc_server as Prometheus subsystem name. All client-side metrics start with grpc_client. Both of them have mirror-concepts. Similarly all methods contain the same rich labels:

  • grpc_service - the gRPC service name, which is the combination of protobuf package and the grpc_service section name. E.g. for package = mwitkow.testproto and service TestService the label will be grpc_service="mwitkow.testproto.TestService"

  • grpc_method - the name of the method called on the gRPC service. E.g.
    grpc_method="Ping"

  • grpc_type - the gRPC type of request. Differentiating between the two is important especially for latency measurements.

    • unary is single request, single response RPC
    • client_stream is a multi-request, single response RPC
    • server_stream is a single request, multi-response RPC
    • bidi_stream is a multi-request, multi-response RPC

Additionally for completed RPCs, the following labels are used:

  • grpc_code - the human-readable gRPC status code. The list of all statuses is to long, but here are some common ones:

    • OK - means the RPC was successful
    • IllegalArgument - RPC contained bad values
    • Internal - server-side error not disclosed to the clients

Counters

The counters and their up to date documentation is in server_reporter.go and client_reporter.go the respective Prometheus handler (usually /metrics).

For the purpose of this documentation we will only discuss grpc_server metrics. The grpc_client ones contain mirror concepts.

For simplicity, let's assume we're tracking a single server-side RPC call of mwitkow.testproto.TestService, calling the method PingList. The call succeeds and returns 20 messages in the stream.

First, immediately after the server receives the call it will increment the grpc_server_started_total and start the handling time clock (if histograms are enabled).

grpc_server_started_total{grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream"} 1

Then the user logic gets invoked. It receives one message from the client containing the request (it's a server_stream):

grpc_server_msg_received_total{grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream"} 1

The user logic may return an error, or send multiple messages back to the client. In this case, on each of the 20 messages sent back, a counter will be incremented:

grpc_server_msg_sent_total{grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream"} 20

After the call completes, its status (OK or other gRPC status code) and the relevant call labels increment the grpc_server_handled_total counter.

grpc_server_handled_total{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream"} 1

Histograms

Prometheus histograms are a great way to measure latency distributions of your RPCs. However, since it is bad practice to have metrics of high cardinality the latency monitoring metrics are disabled by default. To enable them please call the following in your server initialization code:

grpc_prometheus.EnableHandlingTimeHistogram()

After the call completes, its handling time will be recorded in a Prometheus histogram variable grpc_server_handling_seconds. The histogram variable contains three sub-metrics:

  • grpc_server_handling_seconds_count - the count of all completed RPCs by status and method
  • grpc_server_handling_seconds_sum - cumulative time of RPCs by status and method, useful for calculating average handling times
  • grpc_server_handling_seconds_bucket - contains the counts of RPCs by status and method in respective handling-time buckets. These buckets can be used by Prometheus to estimate SLAs (see here)

The counter values will look as follows:

grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="0.005"} 1
grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="0.01"} 1
grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="0.025"} 1
grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="0.05"} 1
grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="0.1"} 1
grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="0.25"} 1
grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="0.5"} 1
grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="1"} 1
grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="2.5"} 1
grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="5"} 1
grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="10"} 1
grpc_server_handling_seconds_bucket{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream",le="+Inf"} 1
grpc_server_handling_seconds_sum{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream"} 0.0003866430000000001
grpc_server_handling_seconds_count{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream"} 1

Useful query examples

Prometheus philosophy is to provide raw metrics to the monitoring system, and let the aggregations be handled there. The verbosity of above metrics make it possible to have that flexibility. Here's a couple of useful monitoring queries:

request inbound rate

sum(rate(grpc_server_started_total{job="foo"}[1m])) by (grpc_service)

For job="foo" (common label to differentiate between Prometheus monitoring targets), calculate the rate of requests per second (1 minute window) for each gRPC grpc_service that the job has. Please note how the grpc_method is being omitted here: all methods of a given gRPC service will be summed together.

unary request error rate

sum(rate(grpc_server_handled_total{job="foo",grpc_type="unary",grpc_code!="OK"}[1m])) by (grpc_service)

For job="foo", calculate the per-grpc_service rate of unary (1:1) RPCs that failed, i.e. the ones that didn't finish with OK code.

unary request error percentage

sum(rate(grpc_server_handled_total{job="foo",grpc_type="unary",grpc_code!="OK"}[1m])) by (grpc_service)
 / 
sum(rate(grpc_server_started_total{job="foo",grpc_type="unary"}[1m])) by (grpc_service)
 * 100.0

For job="foo", calculate the percentage of failed requests by service. It's easy to notice that this is a combination of the two above examples. This is an example of a query you would like to alert on in your system for SLA violations, e.g. "no more than 1% requests should fail".

average response stream size

sum(rate(grpc_server_msg_sent_total{job="foo",grpc_type="server_stream"}[10m])) by (grpc_service)
 /
sum(rate(grpc_server_started_total{job="foo",grpc_type="server_stream"}[10m])) by (grpc_service)

For job="foo" what is the grpc_service-wide 10m average of messages returned for all server_stream RPCs. This allows you to track the stream sizes returned by your system, e.g. allows you to track when clients started to send "wide" queries that ret Note the divisor is the number of started RPCs, in order to account for in-flight requests.

99%-tile latency of unary requests

histogram_quantile(0.99, 
  sum(rate(grpc_server_handling_seconds_bucket{job="foo",grpc_type="unary"}[5m])) by (grpc_service,le)
)

For job="foo", returns an 99%-tile quantile estimation of the handling time of RPCs per service. Please note the 5m rate, this means that the quantile estimation will take samples in a rolling 5m window. When combined with other quantiles (e.g. 50%, 90%), this query gives you tremendous insight into the responsiveness of your system (e.g. impact of caching).

percentage of slow unary queries (>250ms)

100.0 - (
sum(rate(grpc_server_handling_seconds_bucket{job="foo",grpc_type="unary",le="0.25"}[5m])) by (grpc_service)
 / 
sum(rate(grpc_server_handling_seconds_count{job="foo",grpc_type="unary"}[5m])) by (grpc_service)
) * 100.0

For job="foo" calculate the by-grpc_service fraction of slow requests that took longer than 0.25 seconds. This query is relatively complex, since the Prometheus aggregations use le (less or equal) buckets, meaning that counting "fast" requests fractions is easier. However, simple maths helps. This is an example of a query you would like to alert on in your system for SLA violations, e.g. "less than 1% of requests are slower than 250ms".

Status

This code has been used since August 2015 as the basis for monitoring of production gRPC micro services at Improbable.

License

go-grpc-prometheus is released under the Apache 2.0 license. See the LICENSE file for details.

go-grpc-prometheus's People

Contributors

aleksi avatar brancz avatar bwplotka avatar chemidy avatar dominikh avatar drausin avatar fengzixu avatar fusakla avatar hsaliak avatar jeanlucmongrain avatar jeffwidman avatar jsha avatar knweiss avatar povilasv avatar richih avatar samihiltunen avatar simonpasquier avatar symbiont-rune-aune avatar tonywang avatar zwopir avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

go-grpc-prometheus's Issues

Support Namespace and Subsytem for ServerMetrics

After poking around with ServerMetrics type I was unable to modify the behavior ofNewServerMetrics() function under server_metrics.go.
Can someone help me understand how can one add Namespace/Subsystem CounterOpts to e.g. ServerMetrics.serverStartedCounter?

Document how to register Server or change init() to register the same way

I needed to change the namespace so I checked here first and found #29. It took a few minutes to figure out because all the metrics are private on *Server and the init() registers them explicitly, and my grep found no other calls to MustRegister for a *Server type. I realized after digging into *Server that it was itself a collector so registering that worked.

To give this hint to users I propose changing:

func init() {
	prom.MustRegister(DefaultServerMetrics.serverStartedCounter)
	prom.MustRegister(DefaultServerMetrics.serverHandledCounter)
	prom.MustRegister(DefaultServerMetrics.serverStreamMsgReceived)
	prom.MustRegister(DefaultServerMetrics.serverStreamMsgSent)
}

To:

func init() {
	prom.MustRegister(DefaultServerMetrics)
}

Or adding a basic custom namespace example in the README if there is a technical reason the above would be incorrect:

myCustomMetrics := grpcprom.NewServerMetrics(
  grpcprom.CounterOption(func(o *prometheus.CounterOpts) {
    o.Namespace = "my_custom_namespace"
  }),
)
prom.MustRegister(myCustomMetrics)
srv := grpc.NewServer(...)
myCustomMetrics.InitializeMetrics(srv)

Is it possible to expose metrics only client-side?

The client-side example in the README shows how to add the client-side interceptors, but doesn't include anything about initializing the client-side metrics or exposing them via promhttp.Handler. I noticed in the docs that there's no client-side equivalent for the grpc_prometheus.Register function, so I'm curious if it's even possible to expose metrics only client-side?

I'm wanting to use this capability to instrument an instance of the grpc-gateway that I run, which from a gRPC perspective is client-side only. Perhaps for it to work I need to use it in the server-side gRPC microservices the grpc-gateway talks to as well, and the client-side metrics will be exposed there? If there's an example of using this w/ grpc-gateway that would be great too.

Please export more code

Right now I use that code for unary RPC:

// Unary adds context logger and Prometheus metrics to unary server RPC.
func Unary(ctx context.Context, req interface{}, info *grpc.UnaryServerInfo, handler grpc.UnaryHandler) (interface{}, error) {
    ctx = logger.Set(ctx, logger.MakeRequestID())
    return grpc_prometheus.UnaryServerInterceptor(ctx, req, info, handler)
}

It creates a new derived request with set logger and then calls your interceptor which calls handler. However, it is not possible to do the same for streams without copy&pasting half of that package. Please expose more types and functions to make this possible.

[IMPORTANT] Move this repository to go-grpc-middleware.v2

Hi,

As maintainers, we decided it would be healthy and much easier to maintain and improve this project if we would join the effort on creating best middleware (interceptors) together with go-grpc-middleware project. In fact, I am maintaining both of the projects. (:

We believe it can increase movement and allows us to clean up old dead code that we cannot remove due to compatibility guarantees of v1.

We propose the following:

  • Migrate prometheus middleware as a new metric interceptor for newly rewritten go-grpc-middleware.v2 as designed in grpc-ecosystem/go-grpc-middleware#275
  • Document it there.
  • Put this repository in the maintenance mode and support v1 bugfixes only for some time (not sure how long). All improvements and features should go to go-grpc-middleware.v2

Help wanted for initial implementation of Prometheus interceptor inside go-grpc-middleware.v2 ๐Ÿค—

cc @brancz

Unwrapping errors before converting to gRPC status

Hi, I discovered that when using error wrapping, the reported grpc_status is reported as an Unknown status. I encountered this issue in the Thanos project where for example context canceled error is normally reported as Cancelled gRPC status but once wrapped becomes Unknown.

The issue should be eventually solved in the go-grpc status package. There is even an issue for this grpc/grpc-go#2934. The problem is there are currently cca 3 ways to wrap errors and every one doing it different way.

I created a draft with bit hacky unwrapping of two of those, native Go unwrapping (commented out since added in 1.13.0 and I'm not sure how to deal with the dependency) and pkg/errors wrapf.

Not sure what is the right way to handle this, but currently this is quite inconvenient.
I'd be glad for any opinions on this.

Thanks!

Some test failed

go test shows errors sometimes, It's not always happen

there maybe errors like bellow

โžœ  go-grpc-prometheus git:(master) go test
--- FAIL: TestStreamingIncrementsHandled (0.00s)
        Error Trace:    server_test.go:218
        Error:          Not equal:
                        expected: 1
                        received: 0
        Messages:       grpc_server_handled_total should be incremented for PingList FailedPrecondition

--- FAIL: TestStreamingIncrementsStarted (0.00s)
        Error Trace:    server_test.go:168
        Error:          Not equal:
                        expected: 6
                        received: 5
        Messages:       grpc_server_started_total should be incremented for PingList

--- FAIL: TestServerInterceptorSuite (0.05s)
        server_test.go:86: stopped grpc.Server at: 127.0.0.1:51448
FAIL
exit status 1

expose number of active streams

We used to have a customized gPRC metrics system in etcd. And we want to change that to a more standard one. The only missing metrics from this library is the number of active streams.

In etcd, users might keep long running streams (watchers, lease keepalives). So it is important for us to monitor the number of active streams. Do you think this is a useful one in general?

Thanks!

pre-defining gRPC call metrics

Currently, the vector metrics are created on the fly - i.e. once the call is made, the corresponding metric label is registered. So for example, if I have a unary gRPC service "Foo", only after I make a call to "Foo", the metrics with labels grpc_method=Foo grpc_service=FooService will appear in Prometheus' /metrics endpoint.

Would it be possible to add a method/function to pre-warm metrics in the case when one needs a complete set of possible metrics up-front? My use case is integration testing - while checking whether all the metrics are present/registered during integration test, I need to account for all gRPC calls that have been made during that test, which makes it flaky.

Prometheus Go client itself allows calling <metrics_object>.WithLabelValues(...) in order to achieve this.

Thanks in advance for a reply.

Add Summary metrics

I've POC'd adding summary metrics in a similar way as histograms. It's enabled via a similar method EnableHandlingTimeSummary(). Is there interest in me submitting a PR?

Motivation:
Our monitoring situation is in transition as we move to prometheus/grafana. We currently export metrics to a hosted solution via a prometheus remote adapter. In that hosted solution calculating a p99 from histogram buckets is not possible (normally you would use histogram_quantile())

Client Time Histogram for bidi_stream

It looks like after adding EnableClientHandlingTimeHistogram() I can collect counters for unary rpc messages, but there is not reporting on bidirectional streams.

Is this intentional, or a bug?

Option to reduce cardinality of grpc_server_handled_total

For each method the grpc_server_handled_total exposes currently ~17 labels, per each status code. In most cases having just SUCCESS / FAILURE distinction would be enough, and would reduce amount of exported labels, which maybe an issue in case of multiple rpc methods.

For a single data point, imagine a server with 20 RPC methods, each replica will then exports about 400 labels just for RPC statuses.

This 17 labels looks like on par with histrogram metric, that has option to disable it because of the cardinality issue.

Support custom prometheus registerer

I want to use a custom registry, currently using go-grpc-prometheus in v1.2.0.
In #37 , @brancz suggested the following:

r := prometheus.NewRegistry()
grpcMetrics := grpc_prometheus.NewServerMetrics()
gs := grpc.NewServer(
	grpc.StreamInterceptor(grpcMetrics.StreamServerInterceptor()),
	grpc.UnaryInterceptor(grpcMetrics.UnaryServerInterceptor()),
)

as := api.NewAPIServer()
pb.RegisterAPIServer(gs, as)

grpcMetrics.Register(r)
grpcMetrics.InitializeMetrics(as)

But for v1.2.0, ServerMetrics hasn't got a Register() function.
Is there another way to use a custom registry?

Is this project maintained?

@mwitkow or any other author - there are outstanding PRs against this project, and no updates for a while.
Is this unmaintained?
Do let me know - If I do not hear back in a week, I will provide a PR to indicate that the project is unmaintained,, so that people with Open PRs are not in limbo..

How to simplify grpc prometheus metrics

There are thousands of grpc methods in my app, which makes the single metric size is larger than 16MB.

I would like to customize the metrics (e.g. remove grpc_method label) to reduce its size.

Is there any doable way to achieve it?

Support custom prometheus registerer

Currently all metrics are exposed in the global registerer instance. Would be nice if one could provide a custom registerer instead of using a global one.

Average latency without histograms

Is there a way to enable grpc_server_handling_seconds_sum and grpc_server_handling_seconds_count, so one can do aggregations and computations of average latency:

grpc_server_handling_seconds_sum{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream"} 0.0003866430000000001
grpc_server_handling_seconds_count{grpc_code="OK",grpc_method="PingList",grpc_service="mwitkow.testproto.TestService",grpc_type="server_stream"} 1
...

without the full blown histograms (which do have storage cost on Prometheus)?

Even better, would be ability to have also metrics without grpc_code, but I believe this is tracked in a separate issue.

PS. Also for client calls the same.

Improve metric intiialization to 0

@brian-brazil in #2 mentioned:

As I said above: sadly it's not possible this is not possible from a technical point of view.
They could be initialised on every RPC. If we can't do that then we need to create a success and failure metric without labels, which will avoid the main pitfalls (with the downside of loss of granularity).

gRPC interceptors only see an RPC when it lands, they have no access to all registrations.

@brian-brazil, can you please clarify what you mean there? A single non-labeled metric? How will that help? What kind of pitfalls?

Version 2 - promgrpc merge

I'm the author of promgrpc (that, in fact, is using a lot of code from this repository). In reply to #37 (comment) I would like to know if you are open to incorporate my version into your codebase/organization.

More about differences can be found here: #37 (comment).

It would solve:

  • Decouple registration from grpc.Server implementation #17.
  • Implementing the grpc-go/stats.Handler interface #36.
  • Potentially #49 as well (if the name would be kept).

Travis: go versions?

Two questions:

  • Shouldn't we add the two supported (and therefore most important) Go versions go1.9 and go1.10 to Travis, too?
$ head .travis.yml -n 8
sudo: false
language: go
go:
  - 1.6.x
  - 1.7.x
  - 1.8.x
  - master

  • What about removing old versions? If we drop go1.6 we could e.g. import package context instead of the old golang.org/x/net/context as it was integrated into the standard library in go1.7.

excluding grpc.health.v1.Health from server metrics

I would like to exclude all metrics from grpc.health.v1.Health from list of metrics that are collection from my grpc server.
I looked for it code but couldn't figure out how to do it.

Is there are way to do it?, any pointers would be helpful.

Thanks

Allow limiting grpc codes tracked in histrogram

Latency alerts tend to be most focused on succeeding calls, and since errors can already be accurately I would like to allow to ignore failing requests from the histogram metrics.

I am a bit torn though on what to do:

  1. Make this configurable, I don't really like this because suddenly the same histogram metric of different processes may behave totally different from the next.
  2. Change the default, this might cause unexpected behavior for users that are already using the library today.

Since I'm unable to ignore failed requests from the histogram at all today, I'm going to mark this as a bug, as it's essentially of no use as it is for alerting purposes.

Wdyt @bwplotka ?

heads up: codecov.io security incident - https://about.codecov.io/security-update/

Hi there.

This might be an unusual "issue" beeing reported.

There has been a security incident in codecov.io with the bash-uploader script (see [1] for details) which potentially exposed secrets to 3rd parties.

It seems you are using the referenced bash uploader in your .travis.yml file. I wanted to draw your attention to this incident in case you missed it.

It would be great if you could verify that no code has been altered and check the impact of this security incident on your repository.

Regards,
Robert

[1] https://about.codecov.io/security-update/

Potential ways to improve metrics

There's a few ways these metrics can be improved in line with best practices

service is a common target label, so it is unwise to use it as an instrumentation label as it'll cause clashes for some users. I recommend grpc_service instead.

type is a very generic label name, try to find something more specific. This is to avoid half the metrics exposed by a binary having a type label, each with different meanings.

grpc_server_rpc_handled_bucket should include the seconds unit in the name

grpc_server_rpc_handled_bucket should not include code as a label. There's two reasons. Firstly "The list of all statuses is to long", which combined with the other 4 labels in this metric threatens to make the metric cardinality too high. Secondly is it strongly recommend not to break out latency metrics by success/failure. Better would be a separate failures metric broken out by code, don't forget to initialise all possible label values to 0.

Next major version: Changes to consider

This issue is a wip collection of (compatibility-breaking) changes to consider for the next major version:

  • The package name currently is grpc_prometheus. However, the recommendation for Go package names is "Good package names are short and clear. They are lower case, with no under_scores or mixedCaps." Consequently golint complains:
client.go:6:1: don't use an underscore in package name
client_metrics.go:1:1: don't use an underscore in package name
client_reporter.go:4:1: don't use an underscore in package name
client_test.go:4:1: don't use an underscore in package name
metric_options.go:1:1: don't use an underscore in package name
server.go:6:1: don't use an underscore in package name
server_metrics.go:1:1: don't use an underscore in package name
server_reporter.go:4:1: don't use an underscore in package name
server_test.go:4:1: don't use an underscore in package name
util.go:4:1: don't use an underscore in package name
  • Consider unexporting the four constants in util.go (golint):
    util.go:16:2: exported const Unary should have comment (or a comment on this block) or be unexported

grpc_client_msg_received_total is counted only once invoke server-side error

Hi,

Today I tested the client-side metrics and found grpc_client_msg_received_total cannot be display in metrics, as it will only be counted if the client invoked the server failed.

func (m *ClientMetrics) UnaryClientInterceptor() func(ctx context.Context, method string, req, reply interface{}, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
return func(ctx context.Context, method string, req, reply interface{}, cc *grpc.ClientConn, invoker grpc.UnaryInvoker, opts ...grpc.CallOption) error {
monitor := newClientReporter(m, Unary, method)
monitor.SentMessage()
err := invoker(ctx, method, req, reply, cc, opts...)
if err != nil {
monitor.ReceivedMessage()
}
st, _ := status.FromError(err)
monitor.Handled(st.Code())
return err
}
}

Thanks

Metrics not updated automatically

ctx, cancel := context.WithCancel(ctx)
defer cancel()

srv := &http.Server{
	Addr:    config.ServerHost + ":" + config.PromthesiusPort,
	Handler: promhttp.HandlerFor(promthesusConfig.Registry, promhttp.HandlerOpts{}),
}
//initializing metrics
promthesusConfig.ServerMetrics.InitializeMetrics(grpcServer)
grpc_prometheus.Register(grpcServer)
// graceful shutdown
c := make(chan os.Signal, 1)
signal.Notify(c, os.Interrupt)
go func() {
	for range c {
		// sig is a ^C, handle it
	}
	_, cancel := context.WithTimeout(ctx, 30*time.Second)
	defer cancel()
	logger.Log.Info("graceful shutting down promthesius Server")
	_ = srv.Shutdown(ctx)
}()

when i open a port ..it already intiliased with the default metrics ...but when we hit new request on server ..then the details not updated automatically. After refreshing same problem happens

New release

Based on the discussion in #37 I looked through the commits since the v1.1 release, and the actual changes done to the library since that release seem to be:

  • Variadic counter options: #25
  • Allowing non global metrics registries: #20

The one thing I'm struggling with is as we have never released the non global metrics registry code, I feel we should just remove the non-standard Register and MustRegister methods on the ClientMetrics and ServerMetrics structs.

I would suggest to remove the non standard methods and then do a v1.2 release, as there are no breaking changes.

@Bplotka @fengzixu

Let me know what you think, and I encourage everyone to double check my statement about backward compatibility to ensure we're not falsely pushing out a new minor version if it's not actually compatible.

Regarding your suggestions of linting errors @knweiss, we should probably look in detail at each of the failures, but I'd like to get the backward compatible changes out. Could you create a separate issue for the linting failures so we can have a look at them and decide separately?

How to customize grpc metrics

There are thousands of grpc methods in my app, which makes a single metrics is larger than 16MB.

I would like to customize the metrics (e.g. remove grpc_method label) to reduce its size.

Is it doable?

1.2.0: TestServerInterceptorSuite failures

As reported in https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=902466
TestServerInterceptorSuite sometimes fails as follows:

--- FAIL: TestServerInterceptorSuite (0.73s)
    --- PASS: TestServerInterceptorSuite/TestRegisterPresetsStuff (0.18s)
    --- FAIL: TestServerInterceptorSuite/TestStreamingIncrementsHandled (0.08s)
    	server_test.go:219: 
    			Error Trace:	server_test.go:219
    			Error:      	Not equal: 
    			            	expected: 1
    			            	actual  : 0
    			Test:       	TestServerInterceptorSuite/TestStreamingIncrementsHandled
    			Messages:   	grpc_server_handled_total should be incremented for PingList FailedPrecondition
    --- FAIL: TestServerInterceptorSuite/TestStreamingIncrementsHistograms (0.11s)
    	server_test.go:194: 
    			Error Trace:	server_test.go:194
    			Error:      	Not equal: 
    			            	expected: 4
    			            	actual  : 3
    			Test:       	TestServerInterceptorSuite/TestStreamingIncrementsHistograms
    			Messages:   	grpc_server_handling_seconds_count should be incremented for PingList FailedPrecondition
    --- PASS: TestServerInterceptorSuite/TestStreamingIncrementsMessageCounts (0.08s)
    --- PASS: TestServerInterceptorSuite/TestStreamingIncrementsStarted (0.03s)
    --- PASS: TestServerInterceptorSuite/TestUnaryIncrementsHandled (0.07s)
    --- PASS: TestServerInterceptorSuite/TestUnaryIncrementsHistograms (0.06s)
    --- PASS: TestServerInterceptorSuite/TestUnaryIncrementsStarted (0.07s)
	server_test.go:87: stopped grpc.Server at: 127.0.0.1:33609
FAIL

https://tests.reproducible-builds.org/debian/rb-pkg/buster/amd64/golang-github-grpc-ecosystem-go-grpc-prometheus.html

docs inconsistency: Histograms doesn't include grpc_code

Hi,

unlike the documentation in the README file, it appairs that the grpc_code is not included as label in the histogram.

grpc_server_handling_seconds_bucket - contains the counts of RPCs by status and method in respective handling-time buckets. These buckets can be used by Prometheus to estimate SLAs (see here)
// EnableHandlingTimeHistogram enables histograms being registered when
// registering the ServerMetrics on a Prometheus registry. Histograms can be
// expensive on Prometheus servers. It takes options to configure histogram
// options such as the defined buckets.
func (m *ServerMetrics) EnableHandlingTimeHistogram(opts ...HistogramOption) {
	for _, o := range opts {
		o(&m.serverHandledHistogramOpts)
	}
	if !m.serverHandledHistogramEnabled {
		m.serverHandledHistogram = prom.NewHistogramVec(
			m.serverHandledHistogramOpts,
			[]string{"grpc_type", "grpc_service", "grpc_method"},
		)
	}
	m.serverHandledHistogramEnabled = true
}

The grpc_code is not in the predefined labels.

Is there a way to add grpc_code, or maybe it would be nice to correct the README ?

Thanks

Support custom grpc_code

Hi! Is there a way to set custom grpc error codes?

allCodes var here grpc-ecosystem/go-grpc-prometheus/util.go.

Add optional histogram of msg size.

It would be useful to see e.g avg message size for the frame.

If we want this then for AC:
Add histograms (optional):

  • grpc_server_msg_size_received_bytes
  • grpc_client_msg_size_received_bytes
  • grpc_server_msg_size_sent_bytes
  • grpc_client_msg_size_sent_bytes

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.