Giter VIP home page Giter VIP logo

Comments (6)

alxndr42 avatar alxndr42 commented on September 1, 2024 3

@salanfe Check out PR #283

We're currently deploying a custom build of the sidecar image, but I hope the PR will be accepted, so that we can switch back to the official image.

from stackdriver-prometheus-sidecar.

alxndr42 avatar alxndr42 commented on September 1, 2024

Since dropping labels in Prometheus seems to create new problems, I've added customizable label filtering on finalLabels to seriesCache.refresh(). Seems to work fine in my local build of the sidecar. Would you be interested in receiving a PR for this?

I am using the config file to define filters, i.e.:

label_filters:
  - metric: "^istio_(request|response|tcp).*"
    allow:
      - app
      - destination_canonical_service
      - instance
      - job
      - kubernetes_namespace
      - kubernetes_pod_name
      - response_code
      - source_canonical_service

from stackdriver-prometheus-sidecar.

salanfe avatar salanfe commented on September 1, 2024

Hello ! I'm facing the exact same issue !

Here are the steps to reproduce

  1. create a standard GKE cluster, I'm using 1.19.9-gke.1400. And follow istio setup guide for GKE.
  2. install Istio Operator, and enable sidecar injection in the default namespace kubectl label namespace default istio-injection=enabled
  3. deploy the bookinfo app
  4. deploy prometheus
  5. generate some dummy traffic on the hostname:port/productpage
  6. check that the metrics are available in prometheus, e.g. with istioctl dashboard prometheus, and look for the metric istio_requests_total (for example)

so far so good, this is standard istio installation, with some dummy istio-sidecar metrics from the bookinfo services and a prometheus instance. The istio metrics are visible in prometheus, scraping is working as expected. Then the goal is to deploy the stackdriver-prometheus-sidecar as documented in Using Prometheus

A service account for the sidecar is created, and the service account key exported as a kubernetes secret, e.g.

$ gcloud iam service-accounts create prometheus-stackdriver --display-name prometheus-stackdriver-service-account

$ PROMETHEUS_STACKDRIVER_SA_EMAIL=$(gcloud iam service-accounts list --filter="displayName:prometheus-stackdriver-service-account" --format='value(email)')

$ gcloud projects add-iam-policy-binding ${PROJECT_ID} --role roles/monitoring.metricWriter --member serviceAccount:${PROMETHEUS_STACKDRIVER_SA_EMAIL}

$ gcloud iam service-accounts keys create prometheus-stackdriver-service-account.json --iam-account ${PROMETHEUS_STACKDRIVER_SA_EMAIL}

$ kubectl -n istio-system create secret generic prometheus-stackdriver-service-account --from-file=key.json=prometheus-stackdriver-service-account.json

then, the following patch is applied to the prometheus (istio) deployment to add the stackdriver-prometheus-sidecar

# prometheus-patch.yaml
spec:
  template:
    spec:
      volumes:
        - name: google-cloud-key
          secret:
            secretName: prometheus-stackdriver-service-account
      containers:
        - name: sidecar
          image: gcr.io/stackdriver-prometheus/stackdriver-prometheus-sidecar:0.8.2
          imagePullPolicy: Always
          args:
            - "--stackdriver.project-id=XXX"
            - "--prometheus.wal-directory=/data/wal"
            - "--stackdriver.kubernetes.location=XXX"
            - "--stackdriver.kubernetes.cluster-name=XXX"
            - "--log.level=debug"
          ports:
            - name: sidecar
              containerPort: 9091
          env:
            - name: GOOGLE_APPLICATION_CREDENTIALS
              value: /var/secrets/google/key.json
          volumeMounts:
            - name: storage-volume
              mountPath: /data
            - name: google-cloud-key
              mountPath: /var/secrets/google

with e.g.

$ kubectl -n istio-system patch deployment prometheus --type strategic --patch="$(cat prometheus-patch.yaml)" 

now looking at the logs of the stackdriver-prometheus-sidecar, I get, among others, those error messages

level=debug ts=2021-06-02T19:16:29.026Z caller=series_cache.go:395 component="Prometheus reader" msg="too many labels" labels="{__name__=\"istio_request_duration_milliseconds_bucket\",app=\"productpage\",connection_security_policy=\"unknown\",destination_app=\"reviews\",destination_canonical_revision=\"v3\",destination_canonical_service=\"reviews\",destination_cluster=\"Kubernetes\",destination_principal=\"spiffe://cluster.local/ns/default/sa/bookinfo-reviews\",destination_service=\"reviews.default.svc.cluster.local\",destination_service_name=\"reviews\",destination_service_namespace=\"default\",destination_version=\"v3\",destination_workload=\"reviews-v3\",destination_workload_namespace=\"default\",instance=\"10.72.1.9:15020\",istio_io_rev=\"default\",job=\"kubernetes-pods\",kubernetes_namespace=\"default\",kubernetes_pod_name=\"productpage-v1-6b746f74dc-mgblb\",le=\"3600000\",pod_template_hash=\"6b746f74dc\",reporter=\"source\",request_protocol=\"http\",response_code=\"200\",response_flags=\"-\",security_istio_io_tlsMode=\"istio\",service_istio_io_canonical_name=\"productpage\",service_istio_io_canonical_revision=\"v1\",source_app=\"productpage\",source_canonical_revision=\"v1\",source_canonical_service=\"productpage\",source_cluster=\"Kubernetes\",source_principal=\"spiffe://cluster.local/ns/default/sa/bookinfo-productpage\",source_version=\"v1\",source_workload=\"productpage-v1\",source_workload_namespace=\"default\",version=\"v1\"}"

At this point, just like @7adietri, I'm looking at a light way to transform those istio metrics so that they can be pushed to cloud monitoring. Ideally, the solution should be easily maintainable. From the Quotas and limits page for custom metrics, it looks like the maximum number of labels is 10.

from stackdriver-prometheus-sidecar.

Naterd avatar Naterd commented on September 1, 2024

I have had success in the past as a GCP customer that opening a support case asking for this PR to be reviewed and merged has helped get a response as I am currently running into this exact issue.

Opening a support case now :)

from stackdriver-prometheus-sidecar.

igorpeshansky avatar igorpeshansky commented on September 1, 2024

@7adietri, could you please be more specific about the issues you ran into with metric_relabel_configs (what was your exact configuration, what were the exact errors in the sidecar log, what environment you were running in)? Before we consider adding new functionality to the sidecar, it would help us to understand why the existing solution (metric relabeling on the Prometheus side) does not cover your use case. Thanks.

from stackdriver-prometheus-sidecar.

alxndr42 avatar alxndr42 commented on September 1, 2024

@igorpeshansky Ok, so this is a typical error message (only visible at debug level) when using the sidecar in a cluster with Knative/Istio:

level=debug ts=2021-07-29T11:13:36.704Z caller=series_cache.go:395 component="Prometheus reader" msg="too many labels" labels="{__name__=\"istio_requests_total\",app=\"test1-00001\",connection_security_policy=\"mutual_tls\",destination_app=\"test1-00001\",destination_canonical_revision=\"test1-00001\",destination_canonical_service=\"test1\",destination_principal=\"spiffe://cluster.local/ns/default/sa/default\",destination_service=\"test1-00001-private.default.svc.cluster.local\",destination_service_name=\"test1-00001-private\",destination_service_namespace=\"default\",destination_version=\"unknown\",destination_workload=\"test1-00001-deployment\",destination_workload_namespace=\"default\",instance=\"10.76.1.13:15020\",istio_io_rev=\"default\",job=\"kubernetes-pods\",kubernetes_namespace=\"default\",kubernetes_pod_name=\"test1-00001-deployment-844f655ddc-jsbl2\",pod_template_hash=\"844f655ddc\",reporter=\"destination\",request_protocol=\"http\",response_code=\"200\",response_flags=\"-\",security_istio_io_tlsMode=\"istio\",service_istio_io_canonical_name=\"test1\",service_istio_io_canonical_revision=\"test1-00001\",serving_knative_dev_configuration=\"test1\",serving_knative_dev_configurationGeneration=\"1\",serving_knative_dev_configurationUID=\"07775999-0632-4a06-9680-94e82c663bb8\",serving_knative_dev_revision=\"test1-00001\",serving_knative_dev_revisionUID=\"3cce0d59-b49a-4ed3-90e9-4c8881d14c75\",serving_knative_dev_service=\"test1\",serving_knative_dev_serviceUID=\"07a4b48c-4bce-4f8c-a230-bb2223de4ecc\",source_app=\"activator\",source_canonical_revision=\"latest\",source_canonical_service=\"activator\",source_principal=\"spiffe://cluster.local/ns/knative-serving/sa/controller\",source_version=\"unknown\",source_workload=\"activator\",source_workload_namespace=\"knative-serving\"}"

Definitely more than 10 labels. Lets start with removing all the source_ and destination_ labels. I add this to the scrape_configs in prometheus.yml:

      metric_relabel_configs:
      - regex: "^(source|destination)_.*"
        action: labeldrop

Looks good in the Prometheus UI:

istio_requests_total{app="test1-00001",connection_security_policy="mutual_tls",instance="10.76.1.13:15020",istio_io_rev="default",job="kubernetes-pods",kubernetes_namespace="default",kubernetes_pod_name="test1-00001-deployment-844f655ddc-jsbl2",pod_template_hash="844f655ddc",reporter="destination",request_protocol="http",response_code="200",response_flags="-",security_istio_io_tlsMode="istio",service_istio_io_canonical_name="test1",service_istio_io_canonical_revision="test1-00001",serving_knative_dev_configuration="test1",serving_knative_dev_configurationGeneration="1",serving_knative_dev_configurationUID="07775999-0632-4a06-9680-94e82c663bb8",serving_knative_dev_revision="test1-00001",serving_knative_dev_revisionUID="3cce0d59-b49a-4ed3-90e9-4c8881d14c75",serving_knative_dev_service="test1",serving_knative_dev_serviceUID="07a4b48c-4bce-4f8c-a230-bb2223de4ecc"}

But in the sidecar log, new error messages start appearing:

level=debug ts=2021-07-29T12:15:26.814Z caller=client.go:202 component=storage msg="Partial failure calling CreateTimeSeries" err="rpc error: code = InvalidArgument desc = Field timeSeries[17].points[0].distributionValue had an invalid value: Distribution |explicit_buckets.bounds| entry 1 has a value of 0.5 which is less than the value of entry 0 which is 0.5."
level=warn ts=2021-07-29T12:15:26.814Z caller=queue_manager.go:534 component=queue_manager msg="Unrecoverable error sending samples to remote storage" err="rpc error: code = InvalidArgument desc = Field timeSeries[17].points[0].distributionValue had an invalid value: Distribution |explicit_buckets.bounds| entry 1 has a value of 0.5 which is less than the value of entry 0 which is 0.5."

So already this isn't working with the sidecar, and I'm not even close to 10 labels.

On the other hand, everything works perfectly fine when I switch to a sidecar image built from #283 and use the following sidecar.yml:

    metric_label_filters:
      - metric: "^istio_(request|response|tcp).*"
        allow:
          - istio_canonical_name
          - istio_canonical_revision
          - reporter
          - request_protocol
          - response_code
          - response_flags

(The istio_ labels are mapped from service_istio_io_ in Prometheus, because the sidecar drops those.)

from stackdriver-prometheus-sidecar.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.