census-ecosystem / opencensus-go-exporter-stackdriver Goto Github PK
View Code? Open in Web Editor NEWOpenCensus Go exporter for Stackdriver Monitoring and Trace
License: Apache License 2.0
OpenCensus Go exporter for Stackdriver Monitoring and Trace
License: Apache License 2.0
Added in census-instrumentation/opencensus-go#252
This label seems like it will have high cardinality and simultaneously not be very useful. Is there a way we can avoid this? Does exporting deltas instead of cumulative help? Or if we know for sure that the monitored resource is unique, can we provide an option to disable setting this label?
/cc @songy23
Currently in newStatsExporter we record seenProjects[o.ProjectID] = true
and check that we only call this function once per project ID.
This makes it inconvenient to use the exporter in situations where it might be dynamically created at runtime, for example in an Istio adapter.
We should remove this enforcement and just document that creating multiple exporters with the same project ID and monitored resource in the same process is not supported if you register as a stats exporter.
Context: https://cloud.google.com/monitoring/api/v3/metrics-details
Seems like LastValue should map to a GAUGE type metric since it represents an instantaneous value.
Is it ok that the package depends on 'github.com/aws/aws-sdk-go'?
The total time to get all depencies in Google Cloud Builder is big.
Raised in a PR by @songy23 #74 (comment)
IIRC in this example each "a/b/c" needs to be written in time order, otherwise SD will return another error.
This issue is a spin-off to track a potential issue and fix it if need be, separately from the massive PR #74
Hello, i try to test opencensus with my stackdriver project.
With the example code, i always have an unauthenticated error when i record stats.
After a few debug i have removed the projectID to use the default credential because if it is set, the exporter not get authentication (
)But i have always the same error, the context.Context is empty after authentication (
)And i always have error on record...
I have tested my credential with google example (https://github.com/GoogleCloudPlatform/golang-samples/blob/master/monitoring/monitoring_quickstart/main.go) and all is good.
Can you help me to understand why?
This is on stackdriver export 0.8.0 and opencensus 0.18.0
Hi,
Im pretty sure im doing something wrong but cant seem to get it to export as gauge.
Versions:
Locking in v0.14.0 (e262766) for direct dep go.opencensus.io
Locking in v0.5.0 (37aa280) for transitive dep contrib.go.opencensus.io/exporter/stackdriver
Sample:
package main
import (
"context"
"fmt"
"log"
"time"
"go.opencensus.io/exporter/stackdriver"
"go.opencensus.io/stats"
"go.opencensus.io/stats/view"
)
var (
numOpenMeasure = stats.Int64("opencensus.io/test/num_open_test_measure_v3", "Number open connections", stats.UnitDimensionless)
)
func main() {
exporter, err := stackdriver.NewExporter(stackdriver.Options{
ProjectID: "xxx"})
if err != nil {
log.Fatalln(err)
}
view.RegisterExporter(exporter)
numOpenView := &view.View{
Name: "opencensus.io/test/num_open_test_view_v3",
Description: "Number open connections",
Measure: numOpenMeasure,
Aggregation: view.LastValue(),
}
if err := view.Register(
numOpenView,
); err != nil {
log.Fatal(err)
}
numOpen := int64(50)
for {
stats.Record(context.Background(), numOpenMeasure.M(numOpen))
fmt.Println("Num open: ", numOpen)
time.Sleep(5 * time.Second)
numOpen++
}
}
Any ideas?
Thanks!
It is currently impossible to set an empty MetricPrefix. Setting this field to an empty string causes it to assume the default value of "OpenCensus".
In many cases (perhaps most), the default view names are already well namespaced enough to not require any global prefix. For example, this is what the gRPC metrics look like with a prefix of "testapp":
vendor/contrib.go.opencensus.io/exporter/stackdriver/monitoredresource/aws_identity_doc_utils.go
has imports:
18: "github.com/aws/aws-sdk-go/aws/ec2metadata"
19: "github.com/aws/aws-sdk-go/aws/session"
but importing stackdriver/monitoredresource
causes me to vendor github.com/aws/aws-sdk-go/aws
in my godep package.
I'm not using aws at all, so it's not right for me to have aws sdk end up in my final binary.
There are ways to avoid this such as building a wrapper that satisfies an interface.
Currently we create Stackdriver MetricDescriptors lazily at export time. Instead, we should provide a method to do this explicitly and return any errors that occurred. This can be used to detect schema mismatches eagerly, rather than having to wait for the export path to run.
I've been using Stackdriver Trace for some time via the zipkin-gcp
exporter, but have recently been experimenting with Istio, which sends spans to Stackdriver via this exporter. Istio lets you configure how your spans will look, including their name. I spent a bunch of time racking my brain trying to figure out why Istio was prefixing my client spans with Sent.
and my server spans with Recv.
before tracking it down to this exporter.
Is this a convention I'm unaware of? It was quite unexpected, and the period delimiter could be confusing given that I want my span names to be derived from the HTTP host header. Assuming the Host header example.org
the spans would appear as Sent.example.org
and Recv.example.org
in the Stackdriver Trace UI.
There appears to be no recommendation here for a suitable reporting period for the Monitoring API. The sample code sets it to one second:
view.SetReportingPeriod(1 * time.Second)
If this corresponds to the period of the time series data sent to the Monitoring API, it is way too short. Best practice is a period of 60 seconds.
Apologies if I have misunderstood and this does not actually determine the period between points in a time series. I am not very familiar with OpenCensus.
According to the documentation, Flush
waits for exported view data to be uploaded. This is useful if your program is ending and you do not want to lose recent spans.
This is supposed to be for the stats but the comment mentions spans. If this is supposed to work for stats, it is not working.
Here is my test case:
package main
import (
"context"
"fmt"
"log"
"time"
"go.opencensus.io/exporter/stackdriver"
"go.opencensus.io/stats"
"go.opencensus.io/stats/view"
"go.opencensus.io/tag"
)
var MerrorCount = stats.Int64("razvan.test/measures/error_count", "number of errors encounterd", "1")
var (
ErrorCountView = &view.View{
Name: "demo/razvan/rep_period_test",
Measure: MerrorCount,
Description: "Testing various reporting periods",
Aggregation: view.Count(),
}
)
func main() {
KeyMethod, _ := tag.NewKey("test")
ctx := context.Background()
ctx, _ = tag.New(ctx, tag.Insert(KeyMethod, "30 sec reporting"))
// Register the views
if err := view.Register(ErrorCountView); err != nil {
log.Fatalf("Failed to register views: %v", err)
}
// SD Exporter
sd, err := stackdriver.NewExporter(stackdriver.Options{
ProjectID: "opencensus-test",
// MetricPrefix helps uniquely identify your metrics.
MetricPrefix: "opencensus-test",
})
if err != nil {
log.Fatalf("Failed to create the Stackdriver exporter: %v", err)
}
// It is imperative to invoke flush before your main function exits
defer sd.Flush()
view.RegisterExporter(sd)
// Set reporting period to report data at every second.
view.SetReportingPeriod(60000 * time.Millisecond)
ticker := time.NewTicker(1000 * time.Millisecond)
i := 0
go func() {
for {
select {
case <-ticker.C:
stats.Record(ctx, MerrorCount.M(1))
i++
}
}
}()
runtime := 65000
time.Sleep(time.Duration(runtime) * time.Millisecond)
fmt.Printf("Incremented %d times\n", i)
}
I only get the value exported after 60 seconds.
I'm running an app on app engine where runtime is go111 and stats exporter does not work out of the box getting rpc error: code = Internal desc = One or more TimeSeries could not be written: An internal error occurred.: timeSeries[0-5]
error. This is because multiple instances try to write to the same timeseries.
By default, exporter tags stats by hostname and pid to make destination timeseries unique (https://github.com/census-ecosystem/opencensus-go-exporter-stackdriver/blob/master/stats.go#L152
). However hostname/pid are all the same on app engine.
We can use GAE_INSTANCE
environment variable which is available app engine instance and is unique across instances.
Is it OK for you guys to add app engine specific code (it'll only rely on os
package not google.golang.org/appengine
)? If so, I'm willing to send a PR. Thanks!
In order to use a custom service key, users can set the GOOGLE_APPLICATION_CREDENTIALS env variable. The exporter should document this capability in the godoc.
A bug/inadequecy that I've found while doing a live test with the OpenCensus Agent. If multiple metrics are streamed from multiple sources and more than one at export instance share the same name, we'll have an error from Stackdriver's backend. This is because per CreateTimeSeriesRequest, it expects unique metrics. This problem has plagued even the stats exporter for years and the advice/work-around was setting view.SetReportingPeriod but this just masked the problem, because it gave time for aggregation to occur within an exporting period.
In the case where you have metrics concurrently streamed in, all bets are off for example given this data
{
"name": "projects/census-demos",
"time_series": [
{
"metric": {
"type": "custom.googleapis.com/opencensus/oce/dev/latency",
"labels": {
"client": "cli",
"method": "repl",
"opencensus_task": "[email protected]"
}
},
"resource": {
"type": "global"
},
"points": [
{
"interval": {
"end_time": {
"seconds": 1547860312,
"nanos": 655706000
},
"start_time": {
"seconds": 1547857792,
"nanos": 658197000
}
},
"value": {
"Value": {
"DistributionValue": {
"count": 399,
"mean": 6461.507067283209,
"sum_of_squared_deviation": 5680369911.614502,
"bucket_options": {
"Options": {
"ExplicitBuckets": {
"bounds": [
0,
10,
50,
100,
200,
400,
800,
1000,
1400,
2000,
5000,
10000
]
}
}
},
"bucket_counts": [
0,
0,
1,
0,
3,
3,
21,
5,
17,
15,
89,
153,
92
]
}
}
}
}
]
},
{
"metric": {
"type": "custom.googleapis.com/opencensus/oce/dev/process_counts",
"labels": {
"client": "cli",
"method": "repl",
"opencensus_task": "[email protected]"
}
},
"resource": {
"type": "global"
},
"points": [
{
"interval": {
"end_time": {
"seconds": 1547860312,
"nanos": 655722000
},
"start_time": {
"seconds": 1547857792,
"nanos": 658197000
}
},
"value": {
"Value": {
"Int64Value": 399
}
}
}
]
},
{
"metric": {
"type": "custom.googleapis.com/opencensus/oce/dev/latency",
"labels": {
"client": "cli",
"method": "repl",
"opencensus_task": "[email protected]"
}
},
"resource": {
"type": "global"
},
"points": [
{
"interval": {
"end_time": {
"seconds": 1547860372,
"nanos": 653868000
},
"start_time": {
"seconds": 1547857792,
"nanos": 658197000
}
},
"value": {
"Value": {
"DistributionValue": {
"count": 409,
"mean": 6443.895616823964,
"sum_of_squared_deviation": 5882240635.357754,
"bucket_options": {
"Options": {
"ExplicitBuckets": {
"bounds": [
0,
10,
50,
100,
200,
400,
800,
1000,
1400,
2000,
5000,
10000
]
}
}
},
"bucket_counts": [
0,
0,
1,
0,
4,
3,
22,
6,
17,
16,
90,
156,
94
]
}
}
}
}
]
}
]
}
we get an error
err: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Field timeSeries[2] had an invalid value: Duplicate TimeSeries encountered. Only one point can be written per TimeSeries per request.: timeSeries[2]
because we've got both
{
"metric": {
"type": "custom.googleapis.com/opencensus/oce/dev/latency",
"labels": {
"client": "cli",
"method": "repl",
"opencensus_task": "[email protected]"
}
},
"resource": {
"type": "global"
},
"points": [
{
"interval": {
"end_time": {
"seconds": 1547860312,
"nanos": 655706000
},
"start_time": {
"seconds": 1547857792,
"nanos": 658197000
}
},
"value": {
"Value": {
"DistributionValue": {
"count": 399,
"mean": 6461.507067283209,
"sum_of_squared_deviation": 5680369911.614502,
"bucket_options": {
"Options": {
"ExplicitBuckets": {
"bounds": [
0,
10,
50,
100,
200,
400,
800,
1000,
1400,
2000,
5000,
10000
]
}
}
},
"bucket_counts": [
0,
0,
1,
0,
3,
3,
21,
5,
17,
15,
89,
153,
92
]
}
}
}
}
]
}
and
{
"metric": {
"type": "custom.googleapis.com/opencensus/oce/dev/latency",
"labels": {
"client": "cli",
"method": "repl",
"opencensus_task": "[email protected]"
}
},
"resource": {
"type": "global"
},
"points": [
{
"interval": {
"end_time": {
"seconds": 1547860372,
"nanos": 653868000
},
"start_time": {
"seconds": 1547857792,
"nanos": 658197000
}
},
"value": {
"Value": {
"DistributionValue": {
"count": 409,
"mean": 6443.895616823964,
"sum_of_squared_deviation": 5882240635.357754,
"bucket_options": {
"Options": {
"ExplicitBuckets": {
"bounds": [
0,
10,
50,
100,
200,
400,
800,
1000,
1400,
2000,
5000,
10000
]
}
}
},
"bucket_counts": [
0,
0,
1,
0,
4,
3,
22,
6,
17,
16,
90,
156,
94
]
}
}
}
}
]
}
have a metric with Type "custom.googleapis.com/opencensus/oce/dev/latency"
I'm using OpenCensus Stackdriver exporter in a container running on GKE. I use cloud.google.com/go/compute/metadata
to get the ProjectID and pass it to OpenCensus Stackdriver exporter. Sometimes I got following errors when I start to run my container in a pod.
2019/02/14 21:10:04 Failed to export to Stackdriver: context deadline exceeded
2019/02/14 21:10:04 Failed to export to Stackdriver: rpc error: code = Unavailable desc = all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: authentication handshake failed: read tcp 10.32.4.155:48788->74.125.129.95:443: read: connection reset by peer"
If I delete the pod and let GKE recreates a new one for me without doing anything else, sometimes it works again.
What reason could be that sometimes I got those errors sometimes not?
Stackdriver Monitoring has added support for Exemplar
in the latest client libraries. Modify our Stackdriver Stats exporter to support it.
Java counterpart: census-instrumentation/opencensus-java#1771.
There are ways to get the namespace, and the pod name to match up with the GKE example, but there doesn't seem to be a way to get something matching up with the container name.
Maybe that label should be removed from the example? (Or since it's opaque enough that there's open github issues on kubernetes about it, some words describing how to do it?)
I have a simple pubsub app using stackdriver exporter and some views. When I go to stackdriver, I see my stats but without any resource (I expect "global"). I put some logging around handleUpload
and notice it's erroring out:
rpc error: code = InvalidArgument desc = Field timeSeries[1].points[0].distributionValue had an invalid value: Distribution |explicit_buckets.bounds| does not have at least one entry.
2018/09/11 15:28:45 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = Field timeSeries[1].points[0].distributionValue had an invalid value: Distribution |explicit_buckets.bounds| does not have at least one entry.
PublishSuccessMillisView
looks like this:
PublishSuccessMillis = stats.Int64(statsPrefix+"publish_success_millis", "Number of milliseconds to publish a message", stats.UnitMilliseconds)
...
PublishSuccessMillisView *view.View = distView(PublishSuccessMillis)
...
func distView(m *stats.Int64Measure) *view.View {
return &view.View{
Name: m.Name(),
Description: m.Description(),
TagKeys: []tag.Key{subscriptionKey},
Measure: m,
Aggregation: view.Distribution(),
}
}
It gets used as such:
millis := end.Sub(start).Nanoseconds() / 1000000
stats.Record(ctx, PublishSuccessMillis.M(millis))
Actual:
We call createMeasure
before createTimeSeries
when reporting metrics data. If createMeasure
returns an error, the whole reporting process will fail.
Expected:
Drop the bad View
data if createMeasure
returns an error. Let remained data to be sent to stackdriver.
$ go get -u contrib.go.opencensus.io/exporter/stackdriver
import cycle not allowed
package contrib.go.opencensus.io/exporter/stackdriver
imports cloud.google.com/go/monitoring/apiv3
imports google.golang.org/api/transport
imports google.golang.org/api/transport/http
imports contrib.go.opencensus.io/exporter/stackdriver
Reported offline by a few users. Currently we only allow setting MonitoredResource
when creating the exporter, and later user cannot update the resource. While in most cases we expect MonitoredResource
to be the same throughout the application lifetime, there may be cases when users want to associate different MonitoredResource
s with different metric batches. Consider supporting dynamic MonitoredResource
when uploading metrics to Stackdriver.
Another reference: OC-Agent metrics protocol also supports dynamic resources:
// The resource for the metrics in this message that do not have an explicit
// resource set.
// If unset, the most recently set resource in the RPC stream applies. It is
// valid to never be set within a stream, e.g. when no resource info is known
// at all or when all sent metrics have an explicit resource set.
After upgrading from 0.7.0 to the latest master
, I started seeing occasional errors like this from CreateTimeSeries
:
rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Field timeSeries[1] had an invalid value: Duplicate TimeSeries encountered. Only one point can be written per TimeSeries per request.: timeSeries[1]
Here's a sample request matching the error message above (slightly reformatted):
name:"projects/xxxxx"
time_series:<metric:<type:"custom.googleapis.com/opencensus/ts_bridge/metric_import_latencies" labels:<key:"metric_name" value:"sli_sample_ratio10m" > labels:<key:"opencensus_task" value:"go-2@localhost" > > resource:<type:"global" > points:<interval:<end_time:<seconds:1539597540 nanos:971130331 > start_time:<seconds:1539597540 nanos:971119624 > > value:<distribution_value:<count:1 mean:602 bucket_options:<explicit_buckets:<bounds:100 bounds:250 bounds:500 bounds:1000 bounds:2000 bounds:3000 bounds:4000 bounds:5000 bounds:7500 bounds:10000 bounds:15000 bounds:20000 bounds:40000 bounds:60000 bounds:90000 bounds:120000 bounds:300000 bounds:600000 > > bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:1 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 > > > >
time_series:<metric:<type:"custom.googleapis.com/opencensus/ts_bridge/metric_import_latencies" labels:<key:"metric_name" value:"sli_sample_ratio10m" > labels:<key:"opencensus_task" value:"go-2@localhost" > > resource:<type:"global" > points:<interval:<end_time:<seconds:1539597541 nanos:703625748 > start_time:<seconds:1539597540 nanos:971119624 > > value:<distribution_value:<count:1 mean:602 bucket_options:<explicit_buckets:<bounds:100 bounds:250 bounds:500 bounds:1000 bounds:2000 bounds:3000 bounds:4000 bounds:5000 bounds:7500 bounds:10000 bounds:15000 bounds:20000 bounds:40000 bounds:60000 bounds:90000 bounds:120000 bounds:300000 bounds:600000 > > bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:1 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 > > > >
time_series:<metric:<type:"custom.googleapis.com/opencensus/ts_bridge/metric_import_latencies" labels:<key:"metric_name" value:"total_bytes_rcvd" > labels:<key:"opencensus_task" value:"go-2@localhost" > > resource:<type:"global" > points:<interval:<end_time:<seconds:1539597541 nanos:703625748 > start_time:<seconds:1539597540 nanos:971119624 > > value:<distribution_value:<count:1 mean:718 bucket_options:<explicit_buckets:<bounds:100 bounds:250 bounds:500 bounds:1000 bounds:2000 bounds:3000 bounds:4000 bounds:5000 bounds:7500 bounds:10000 bounds:15000 bounds:20000 bounds:40000 bounds:60000 bounds:90000 bounds:120000 bounds:300000 bounds:600000 > > bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:1 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 > > > >
time_series:<metric:<type:"custom.googleapis.com/opencensus/ts_bridge/metric_import_latencies" labels:<key:"metric_name" value:"total_bytes_sent" > labels:<key:"opencensus_task" value:"go-2@localhost" > > resource:<type:"global" > points:<interval:<end_time:<seconds:1539597541 nanos:703625748 > start_time:<seconds:1539597540 nanos:971119624 > > value:<distribution_value:<count:1 mean:735 bucket_options:<explicit_buckets:<bounds:100 bounds:250 bounds:500 bounds:1000 bounds:2000 bounds:3000 bounds:4000 bounds:5000 bounds:7500 bounds:10000 bounds:15000 bounds:20000 bounds:40000 bounds:60000 bounds:90000 bounds:120000 bounds:300000 bounds:600000 > > bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:1 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 > > > >
time_series:<metric:<type:"custom.googleapis.com/opencensus/ts_bridge/import_latencies" labels:<key:"opencensus_task" value:"go-2@localhost" > > resource:<type:"global" > points:<interval:<end_time:<seconds:1539597541 nanos:703638673 > start_time:<seconds:1539597540 nanos:971119624 > > value:<distribution_value:<count:1 mean:1337 bucket_options:<explicit_buckets:<bounds:100 bounds:250 bounds:500 bounds:1000 bounds:2000 bounds:3000 bounds:4000 bounds:5000 bounds:7500 bounds:10000 bounds:15000 bounds:20000 bounds:40000 bounds:60000 bounds:90000 bounds:120000 bounds:300000 bounds:600000 > > bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:1 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 bucket_counts:0 > > > >
time_series:<metric:<type:"custom.googleapis.com/opencensus/ts_bridge/oldest_metric_age" labels:<key:"opencensus_task" value:"go-2@localhost" > > resource:<type:"global" > points:<interval:<end_time:<seconds:1539597541 nanos:703641788 > > value:<int64_value:479964 > > >
As you can see, timeSeries[0] and timeSeries[1] are identical except for the end_time (which is less than a second apart).
Stackdriver requires at most one data point per request per time series. I think the exporter will need either to only send the latest point (discarding earlier ones), or use several separate CreateTimeSeries calls per time series.
I believe I've not seen such errors on 0.7.0. I've not examined the diff between 0.7.0 and master
very closely, but looking at the list of commits 2f26a5d seems the most suspicious.
The traces do not currently span across the different services in my setup. I am currently using GKE cluster with the following setup for my endpoint GRPC. I have written a wrapping function that is stored in the package middleware. The dialer and the GRPC server setups are shown below.
I have looked at some resources. Mainly the following two links:
The first is empty and requires additional documentation but I thought it would be what I was looking for. The second uses the old stackdriver trace package, but the author wraps the logic with his own to pass on the headers. Is that required?
grpcServer := grpc.NewServer(unaryInterceptor, tracer.ServerOptionPublicFacing()) unaryInterceptorBack := grpc.UnaryInterceptor(grpcMiddleware.ChainUnaryServer(
grpcLogging.UnaryServerInterceptor(log.NewEntry(l)),
grpcRecovery.UnaryServerInterceptor(),
))
grpc.Dial(c.Services.Address, grpc.WithInsecure(), tracer.ClientDialOption())
package middleware
import (
"context"
"fmt"
"go.opencensus.io/exporter/stackdriver"
"go.opencensus.io/plugin/ocgrpc"
"go.opencensus.io/trace"
"google.golang.org/genproto/googleapis/api/monitoredres"
"google.golang.org/grpc"
"os"
)
// StackDriverTracer is a middleware component for the stackdriver tracer on GCP
type StackDriverTracer struct {
}
const googleProjectID = "GOOGLE_PROJECT_ID"
// NewStackDriverTracer returns a structure with the tracer, GOOGLE_PROJECT_ID must be set as environment
// variable
func NewStackDriverTracer(ctx context.Context) (*StackDriverTracer, error) {
projectID := os.Getenv(googleProjectID)
if projectID == "" {
return nil, fmt.Errorf("the following environment variable must be set %s", googleProjectID)
}
exporter, err := stackdriver.NewExporter(stackdriver.Options{
ProjectID: projectID,
// Set a MonitoredResource that represents a GKE container.
Resource: &monitoredres.MonitoredResource{
Type: "gke_container",
Labels: map[string]string{
"project_id": projectID,
},
},
})
if err != nil {
return nil, err
}
trace.ApplyConfig(trace.Config{DefaultSampler: trace.AlwaysSample()})
trace.RegisterExporter(exporter)
return &StackDriverTracer{}, nil
}
// ClientDialOption provides a client option for the client
func (t *StackDriverTracer) ClientDialOption() (option grpc.DialOption) {
return grpc.WithStatsHandler(&ocgrpc.ServerHandler{})
}
// ServerOptionPublicFacing provides an option for the server that is public facing
func (t *StackDriverTracer) ServerOptionPublicFacing() (grpc.ServerOption) {
return grpc.StatsHandler(&ocgrpc.ServerHandler{IsPublicEndpoint: true})
}
// ServerOptionInternal provides an option for the server that is internal
func (t *StackDriverTracer) ServerOptionInternal() (grpc.ServerOption) {
return grpc.StatsHandler(&ocgrpc.ServerHandler{IsPublicEndpoint: false})
}
Coming here from #65 (comment)
converting to metricsproto.SummaryValue to Stackdriver monitoring/v3 Proto is not straight forward and as per v0.8.0 of this code https://github.com/census-ecosystem/opencensus-go-exporter-stackdriver/tree/v0.8.0 doesn't even yet support this conversion.
This issue is to track that conversion for posterity and to keep it on our radar.
Version number is stuck on 0.8.0. We need to add an exported version variable to OpenCensus and use that here.
I am moving to this library from cloud.google.com/go/trace
due to the obsolete notice, but noticed a few things that has me scratching my head.
I setup the package as follows (I'm using AppEngine Flexible):
import (
"net/http"
"os"
"contrib.go.opencensus.io/exporter/stackdriver"
"contrib.go.opencensus.io/exporter/stackdriver/propagation"
"go.opencensus.io/plugin/ocgrpc"
"go.opencensus.io/plugin/ochttp"
"go.opencensus.io/stats/view"
"go.opencensus.io/trace"
mrpb "google.golang.org/genproto/googleapis/api/monitoredres"
)
func InitTraceClient() error {
res := &mrpb.MonitoredResource{}
res.Type = "gae_app"
res.Labels = make(map[string]string)
res.Labels["project_id"] = os.Getenv("GOOGLE_CLOUD_PROJECT")
res.Labels["module_id"] = os.Getenv("GAE_SERVICE")
res.Labels["version_id"] = os.Getenv("GAE_VERSION")
exporter, err := stackdriver.NewExporter(stackdriver.Options{
ProjectID: os.Getenv("GOOGLE_CLOUD_PROJECT"),
Resource: res,
})
if err != nil {
return err
}
view.RegisterExporter(exporter)
trace.RegisterExporter(exporter)
if err = view.Register(ochttp.DefaultClientViews...); err != nil {
return err
}
if err = view.Register(ocgrpc.DefaultClientViews...); err != nil {
return err
}
return nil
}
For my datastore client I do:
func getDatastoreClient(ctx context.Context) (*datastore.Client, error) {
var options []option.ClientOption
options = append(options, option.WithGRPCDialOption(grpc.WithStatsHandler(&ocgrpc.ClientHandler{})))
return datastore.NewClient(ctx, "", options...)
}
And I add middleware to start tracing on every request:
func TraceHandler(h http.Handler) http.Handler {
traceHandler := &ochttp.Handler{
Handler: h,
Propagation: &stackdriver.HTTPFormat{},
IsPublicEndpoint: false, // I've tried true here as well...
StartOptions: trace.StartOptions{
Sampler: trace.AlwaysSample(),
},
}
fn := func(w http.ResponseWriter, r *http.Request) {
traceHandler.Handler.ServeHTTP(w, r)
}
return http.HandlerFunc(fn)
}
However, I get no HTTP traces in StackDriver, just datastore ones and other gRPC traces I didn't even intend to monitor (Logging). There are no labels associated with the spans to help identify the service/version/http request the spans originated from.
Am I missing something or is this package still not ready to trace requests like cloud.google.com/go/trace
is able to?
Thanks for this feature in general, it's a really helpful tool and I really like the idea of OpenCensus!
Custom metrics (as produced by this exporter) are only compatible with a handful of MonitoredResource types. These have a fairly complicated set of properties associated with them. We should provide strongly-typed constructor functions to correctly build these MonitoredResources.
We could also consider an "Autodetect" MonitoredResource, which would rely on auto-detecting based on the runtime environment (Java currently does this).
Hi,
I think that this commit (#90) may have broken my export. I haven't changed my export code, recompiled yesterday and now I am getting:
2019/02/28 12:01:10 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: pod_id: timeSeries[0]
2019/02/28 12:02:09 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: namespace_id: timeSeries[0,1,3,4,7,9]; Unrecognized resource label: pod_id: timeSeries[2,6,8]; Unrecognized resource label: zone: timeSeries[5]
2019/02/28 12:02:10 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: zone: timeSeries[0]
2019/02/28 12:03:09 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: namespace_id: timeSeries[0-3,9]; Unrecognized resource label: pod_id: timeSeries[5]; Unrecognized resource label: zone: timeSeries[4,6-8]
2019/02/28 12:03:10 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: namespace_id: timeSeries[0]
2019/02/28 12:04:09 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: namespace_id: timeSeries[0,1,4]; Unrecognized resource label: pod_id: timeSeries[6-9]; Unrecognized resource label: zone: timeSeries[2,3,5]
2019/02/28 12:04:10 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: instance_id: timeSeries[0]
2019/02/28 12:05:09 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: namespace_id: timeSeries[2-7]; Unrecognized resource label: pod_id: timeSeries[0,1,9]; Unrecognized resource label: zone: timeSeries[8]
2019/02/28 12:05:10 Failed to export to Stackdriver: rpc error: code = InvalidArgument desc = One or more TimeSeries could not be written: Unrecognized resource label: zone: timeSeries[0]
I am running on GKE. My code is very simple, you can see it here:
https://github.com/DanTulovsky/playground/blob/master/frontend/run.go#L45
Is there more information I can provide here?
Thanks
Dan
The stackdriver exporter reports the wrong OC version info in the user-agent.
Maybe the version info in census-instrumentation/opencensus-go
should be moved out of the internal package, so that it can be imported from exporters. Otherwise it may always run out of sync.
An RPC error (e.g., "ResourceExhausted") results in a log line that does not include the failed RPC method, e.g.:
2019/01/29 02:53:20 Failed to export to Stackdriver: rpc error: code = ResourceExhausted desc = Resource has been exhausted (e.g. check quota).
That makes it hard to debug the failed RPCs, because it's not clear which RPC actually failed.
The error handler should propagate and log the name of the failed RPC.
This was opened in go repo before. Refer this for description.
See: census-instrumentation/opencensus-go#836
It is confusingly named. I recommend we replace it with something like:
DisplayNameFormatter func(*view.View) string
This will give users complete freedom to customize the display name as they see fit. Should also resolve: #22
Stackdriver Monitoring API only allows customer to create MetricDescriptor for metrics with custom.googleapis.com/
or external.googleapis.com/prometheus/
metric prefixes. For other prefixes, the API returns permission denied error.
Currently stackdriver exporter always creates MetricDescriptor before sending metrics. For metrics with other domain prefixes, it fails to send data to Stackdriver.
Per https://cloud.google.com/monitoring/kubernetes-engine/migration if using the new Kubernetes monitoring, the currently Autodetected ResourceTypes will need to be changed:
gke_container
-> k8s_container
gce_instance
-> k8s_node
Blank /http/host values are being sent because this exporter uses the r.URL.Host variable (that is often empty) and doesn't log the r.Host (which is often provided) on http.Request objects.
The reason why is described in this stackoverflow answer: https://stackoverflow.com/questions/42921567/what-is-the-difference-between-host-and-url-host-for-golang-http-request
I've not found a case in which the r.URL.Host was the desired one since HTTP/1.1 and newer (which require Host headers) is standard now.
Follow-up of census-instrumentation/opencensus-go#956.
Equivalent change in Java: census-instrumentation/opencensus-java#1501.
From the Google Stackdriver Trace documentation, trace link types are documented as:
CHILD_LINKED_SPAN (1) = "The linked span is a child of the current span."
PARENT_LINKED_SPAN (2) = "The linked span is a parent of the current span."
While the open-census implementation documents this as:
LinkTypeChild (1) = "The current span is a child of the linked span."
LinkTypeParent (2) = "The current span is the parent of the linked span."
From what I can tell, this exporter currently translates LinkTypeChild
into CHILD_LINKED_SPAN
and LinkTypeParent
into PARENT_LINKED_SPAN
, which seems to be the opposite of what the documentation states it should be.
If I'm using go.mod with go1.11 and have
require (
contrib.go.opencensus.io/exporter/stackdriver v0.8.0
in my go.mod file, it brings some unused dependencies like aws-sdk-go
. If I require v0.5.0
this problem doesn't exist because it was kinda fixed in #35, but v0.8.0 re-exposes the problem.
For example, v0.8.0 brings github.com/aws/[email protected]
which brings github.com/jmespath/go-jmespath
which causes my go build to fail on an golang:1.11-alpine
image because go-jmespath requires gcc
.
I see issues like #60 also opened that describes this. Can we do anything about this?
cc: @rghetia
The monitoredresource.Autodetect()
function uses a closure structure with a sync.Once.Do in an attempt to avoid executing the slow operation of detecting the application resources multiple times.
However, the way it was implemented makes the function to work only for a single call. After the first execution the function returns nil when called again.
Is that the expected behavior for this function? If so, it could be better documented.
Today OpenCensus Go only supports custom.googleapis.com domain prefix but Stackdriver supports more prefixes (and it's expanding - https://cloud.google.com/monitoring/api/ref_v3/rest/v3/projects.metricDescriptors#MetricDescriptor). OC should allow users to report metrics for their registered domain.
opencensus-go-exporter-stackdriver/stats.go
Line 384 in 2f26a5d
Counterpart in Java: census-instrumentation/opencensus-java#1440
Requested by a cloud team that needs an enhanced Stackdriver exporter, there is need to have an error report what fields are related to the failure. They had requested that we change the Stats exporter to take in an OnError of the form
func OnError(err error, rows ...*view.Row)
instead of
func OnError(err Error)
However, that would:
a) Be a breaking change -- those are two different signatures if users already defined:
OnError: func(e error) {
// handle error
}
b) The previously proposed change only applies for stats yet this exporter is both a trace and stats exporter, so we also need a solution that handles stats and tracing
We create an introspectable error that can be type asserted on e.g.
type DetailsError {
failedSpanData []*trace.SpanData
failedViewData []*view.Data
err error
}
func (de *DetailsError) Error() error {
if de == nil || de.err == nil {
return ""
}
return de.err.Error()
}
func (de *DetailsError) FailedSpanData() []*trace.SpanData { return de.failedSpanData }
func (de *DetailsError) FailedViewData() []*view.Data { return de.failedViewData }
obviously with the contract that returned attributes are read-only
sd, err := stackdriver.NewExporter(stackdriver.Options{
OnError: func(err error) {
de := err.(DetailsError)
switch {
case fsd := de.FailedSpanData(); len(fsd) > 0:
// Handle the failed spans
case fvd := de.FailedViewData(); len(fvd) > 0:
// Handle the failed view data/rows
}
}.
})
and I believe with this kind of error, we'd satisfy that requirement and give users the ability to introspect their errors and figure out which rows
/cc @Ramonza @lychung83
I am trying to use OpenCensus to report Stackdriver metrics from an App Engine app. This does not currently work, since App Engine expects a custom context to be used for all API calls, while the exporter just does context.Background()
whenever it needs one.
Is there any reason why stackdriver.NewExporter()
does not accept a custom context?
FYI, this is the specific error I am getting:
panic: not an App Engine context
goroutine 14 [running]:
panic(0x1672a60, 0xc0084374e0)
go/src/runtime/panic.go:491 +0x283
google.golang.org/appengine/internal.fullyQualifiedAppID(0x1cb64c0, 0xc008414080, 0x1d003c0, 0x0)
google.golang.org/appengine/internal/identity_classic.go:54 +0x95
google.golang.org/appengine/internal.FullyQualifiedAppID(0x1cb64c0, 0xc008414080, 0xc00870d290, 0x1ca4b00)
google.golang.org/appengine/internal/api_common.go:77 +0x98
google.golang.org/appengine/internal.AppID(0x1cb64c0, 0xc008414080, 0xc00870d290, 0x1)
google.golang.org/appengine/internal/identity.go:13 +0x35
google.golang.org/appengine.AppID(0x1cb64c0, 0xc008414080, 0xc008669301, 0x33)
google.golang.org/appengine/identity.go:20 +0x35
golang.org/x/oauth2/google.findDefaultCredentials(0x1cb64c0, 0xc008414080, 0xc0086c5440, 0x4, 0x4, 0x163e6c0, 0x1, 0xc0086c5440)
golang.org/x/oauth2/google/default.go:65 +0x52f
golang.org/x/oauth2/google.FindDefaultCredentials(0x1cb64c0, 0xc008414080, 0xc0086c5440, 0x4, 0x4, 0x4, 0x4, 0x4)
golang.org/x/oauth2/google/go19.go:48 +0x53
google.golang.org/api/internal.Creds(0x1cb64c0, 0xc008414080, 0xc008621c20, 0x18, 0x167d580, 0x30)
google.golang.org/api/internal/creds.go:41 +0x108
google.golang.org/api/transport/grpc.dial(0x1cb64c0, 0xc008414080, 0xf0fd00, 0xc0086c5400, 0x3, 0x4, 0xc008527368, 0xf3ddf4, 0xc0086c5400)
google.golang.org/api/transport/grpc/dial.go:65 +0x50e
google.golang.org/api/transport/grpc.Dial(0x1cb64c0, 0xc008414080, 0xc0086c5400, 0x3, 0x4, 0xc0086c53c0, 0xc0086c5400, 0x1d)
google.golang.org/api/transport/grpc/dial.go:37 +0x58
google.golang.org/api/transport.DialGRPC(0x1cb64c0, 0xc008414080, 0xc0086c5400, 0x3, 0x4, 0x1, 0x1, 0x1)
google.golang.org/api/transport/dial.go:41 +0x53
cloud.google.com/go/monitoring/apiv3.NewMetricClient(0x1cb64c0, 0xc008414080, 0xc0087072f0, 0x1, 0x1, 0xc0087072f0, 0x0, 0x1)
cloud.google.com/go/monitoring/apiv3/metric_client.go:106 +0xfe
contrib.go.opencensus.io/exporter/stackdriver.newStatsExporter(0xc00870b5a0, 0x20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
contrib.go.opencensus.io/exporter/stackdriver/stats.go:81 +0x112
contrib.go.opencensus.io/exporter/stackdriver.NewExporter(0xc00870b5a0, 0x20, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
contrib.go.opencensus.io/exporter/stackdriver/stackdriver.go:169 +0x7e
I noticed that when I leave my app running for a period of time, eventually it stops reporting with this error:
{
insertId: "i1qjvug3b9hn98"
labels: {…}
logName: "projects/deklerk-sandbox/logs/go-ps-consumer"
receiveTimestamp: "2018-10-15T15:03:37.561005583Z"
resource: {…}
severity: "INFO"
textPayload: "got stackdriver-opencensus err rpc error: code = Internal desc = One or more TimeSeries could not be written: An internal error occurred.: timeSeries[10,11]
"
timestamp: "2018-10-15T15:03:33Z"
}
What does this error mean? Minimally, should it include the reason why the timeseries could not be written?
In my code I have,
// Subscribe views to see stats in Stackdriver Monitoring.
if err := view.Register(
ocgrpc.ClientSentBytesPerRPCView,
ocgrpc.ClientReceivedBytesPerRPCView,
ocgrpc.ClientRoundtripLatencyView,
ocgrpc.ClientCompletedRPCsView,
ocgrpc.ClientSentMessagesPerRPCView,
ocgrpc.ClientReceivedMessagesPerRPCView,
ocgrpc.ClientServerLatencyView,
pubsub.AckCountView,
pubsub.ModAckCountView,
pubsub.NackCountView,
pubsub.PullCountView,
pubsub.StreamOpenCountView,
pubsub.StreamRequestCountView,
pubsub.StreamRetryCountView,
); err != nil {
panic(err)
}
When I go to stackdriver, I expect to be able to group by RPC. However, the groupings for ClientCompletedRPCsView all have meaningless names.
Moved the issue from census-instrumentation/opencensus-go#766 (comment).
Original issue said:
I am instrumenting an API client for some latency sensitive applications and on turning the sampling rate to always on, which would mimick a server receiving say 50,000 requests per second but with a sampling rate of 1 in 100 so ideally 500 traced QPS I get back thousands of Stackdriver export errors logged on almost a 5 second interval [1]. Perhaps some retry mechanism with exponential backoff or by default using a large buffer as convincing people to using OpenCensus for very traffic applications is bound to happen and it would be worrying to see a bunch of those logs
[1] https://gist.github.com/odeke-em/32cf7359f397a4b93692bcf46109e184 with a sample inlined
$ GOOGLE_APPLICATION_CREDENTIALS=~/creds.json go run main.go
2018/05/27 20:44:47 OpenCensus Stackdriver exporter: failed to upload span: buffer full
2018/05/27 20:44:52 OpenCensus Stackdriver exporter: failed to upload 1126 spans: buffer full
2018/05/27 20:44:57 OpenCensus Stackdriver exporter: failed to upload 1652 spans: buffer full
2018/05/27 20:45:02 OpenCensus Stackdriver exporter: failed to upload 1447 spans: buffer full
2018/05/27 20:45:07 OpenCensus Stackdriver exporter: failed to upload 1598 spans: buffer full
2018/05/27 20:45:12 OpenCensus Stackdriver exporter: failed to upload 925 spans: buffer full
2018/05/27 20:45:17 OpenCensus Stackdriver exporter: failed to upload 1534 spans: buffer full
2018/05/27 20:45:22 OpenCensus Stackdriver exporter: failed to upload 1403 spans: buffer full
2018/05/27 20:45:27 OpenCensus Stackdriver exporter: failed to upload 1034 spans: buffer full
2018/05/27 20:45:32 OpenCensus Stackdriver exporter: failed to upload 1538 spans: buffer full
....
To enable this exporter to be used with the OpenCensus Agent/Service, it needs to be able to accept OpenCensus Proto metrics and convert those directly to Monitoring/v3 metrics.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.