Comments (7)
I have tried adding these two HPA configurations:
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: hpa-prod-phpfpm
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: valinor-api-prod
minReplicas: 4
maxReplicas: 10
metrics:
- type: Pods
pods:
metricName: phpfpm_active_processes
targetAverageValue: 6
---
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: hpa-prod-external
spec:
scaleTargetRef:
apiVersion: apps/v1beta1
kind: Deployment
name: valinor-api-prod
minReplicas: 4
maxReplicas: 10
metrics:
- type: External
external:
metric:
name: custom.googleapis.com|phpfpm_active_processes
target:
averageValue: 6
type: AverageValue
And none of them seem to work as they should. The one using external type does however look a little more weird than the other. (Edit: Doesn't it actually look like it's correctly finding the custom metrics?) Here is the result:
I know there are some kind of statistics coming into Google. Since I can see a phpfpm_active_processes in the metrics explorer, as a custom metric. It does however look a bit sketchy, since the active processes doesn't seem to distribute across the different pods. At least according to metrics explorer in google. But when I look directly at the /fpm-status and keep refreshing, then I can see the active processes are different on each pod in most cases.
This might be the phpfpm exporter container or the google prometheus-to-sd container, as that is still throwing the error:
1 stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
I was hoping I could scale based on active fpm processes, but if that is not the case. Will it be possible to scale based on request-per-second to each pod? Something like in the first example in this custom metrics adapter?
Edit: But then the hpa using external type sometimes look like this. And that is what I find weird.
from k8s-stackdriver.
I have the same logs gcr.io/google-containers/prometheus-to-sd
and as a result I could indeed not use the phpfpm_active_processes
as HPA metric.
You can also see that if you call: kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/YOUR_NAMESPACE/pods/*/phpfpm_active_processes"
the items
will be empty.
The HPA uses that same endpoint, that's why it's not working since there are no items.
I could solve this issue by using the version v0.9.0
instead of v0.9.2
The logs ..MarshalJSON...
is still there, however the metrics is now properly retrieved by the HPA (and by the kubectl command).
I can now use phpfpm_active_processes
for my HPA.
I hope this will help you
from k8s-stackdriver.
I could solve this issue by using the version
v0.9.0
instead ofv0.9.2
If you are talking about the prometheus-to-sd image version, then as you can see in my issue post that I am already using
gcr.io/google-containers/prometheus-to-sd:v0.9.2
Edit: Ahh shit sorry, I read it in the wrong order. I have been looking at yaml configs for too long. I will try it out, thank you.
And I can not use the phpfpm_active_processes as a HPA metrics. I did also try to get the raw by running a command like the one you suggest. And as you also say, the items are empty and I figured that was the reason the metrics not working.
Can you give me any insights in what you might have done differently in your setup?
from k8s-stackdriver.
Here is my full yaml for the prometheus-to-sd
sidecar:
- name: prometheus-to-sd
image: gcr.io/google-containers/prometheus-to-sd:v0.9.0
ports:
- name: profiler
containerPort: 6060
command:
- /monitor
- --stackdriver-prefix=custom.googleapis.com
- --source=:http://localhost:9253
- --pod-id=$(POD_NAME)
- --namespace-id=$(POD_NAMESPACE)
- --cluster-location=$(CLUSTER_REGION)
- --monitored-resource-type-prefix=k8s_
- --scrape-interval=10s
- --export-interval=10s
resources:
requests:
cpu: 10m
limits:
cpu: 10m
env:
- name: POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
- name: POD_NAMESPACE
valueFrom:
fieldRef:
fieldPath: metadata.namespace
- name: CLUSTER_REGION
value: REDACTED
I think it looks pretty similar to yours.
But as I said previously, I was using v0.9.2
initially, which resulted in HPA not working.
Then I switched to v0.9.0
and then HPA was working.
My HPA yaml is like so:
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: REDACTED
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: REDACTED
minReplicas: 3
maxReplicas: 12
metrics:
- type: Pods
pods:
metric:
name: phpfpm_active_processes
target:
type: AverageValue
averageValue: 80 # or whatever fits your case
And I'm using kubernetes v1.24.1
from k8s-stackdriver.
It does look very similar. I am afraid though, that it is not only this container/adapter that is causing me trouble. I don't know if you are using the Google Custom Metrics stackdriver adapter?
I had some trouble setting it up to begin with, but had some more success after installing it like this from this issue
gcloud iam service-accounts create custom-metrics-sd-adapter --project "$GCP_PROJECT_ID"
gcloud projects add-iam-policy-binding "$GCP_PROJECT_ID" \
--member "serviceAccount:custom-metrics-sd-adapter@$GCP_PROJECT_ID.iam.gserviceaccount.com" \
--role "roles/monitoring.editor"
gcloud iam service-accounts add-iam-policy-binding \
--role roles/iam.workloadIdentityUser \
--member "serviceAccount:$GCP_PROJECT_ID.svc.id.goog[custom-metrics/custom-metrics-stackdriver-adapter]" \
"custom-metrics-sd-adapter@$GCP_PROJECT_ID.iam.gserviceaccount.com"
kubectl create -f https://raw.githubusercontent.com/GoogleCloudPlatform/k8s-stackdriver/master/custom-metrics-stackdriver-adapter/deploy/production/adapter.yaml
kubectl annotate serviceaccount custom-metrics-stackdriver-adapter \
"iam.gke.io/gcp-service-account=custom-metrics-sd-adapter@$GCP_PROJECT_ID.iam.gserviceaccount.com" \
--namespace custom-metrics
But I am getting errors like this:
E0712 08:46:48.948553 1 writers.go:117] apiserver was unable to write a JSON response: http2: stream closed
E0712 08:46:48.948614 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0712 08:46:48.949679 1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E0712 08:46:48.951290 1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta2" audit-ID="5b1dd951-6bba-4174-afd6-b4dccf87180d"
E0712 08:46:48.951344 1 timeout.go:135] post-timeout activity - time-elapsed: 3.869µs, GET "/apis/custom.metrics.k8s.io/v1beta2" result: <nil>
E0712 08:46:49.148354 1 writers.go:117] apiserver was unable to write a JSON response: http2: stream closed
E0712 08:46:49.148403 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0712 08:46:49.148460 1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta1" audit-ID="7a3c4de3-18f0-4846-94e2-104be13e4ff1"
E0712 08:46:49.149401 1 writers.go:117] apiserver was unable to write a JSON response: http2: stream closed
E0712 08:46:49.149460 1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E0712 08:46:49.149801 1 writers.go:111] apiserver was unable to close cleanly the response writer: http2: stream closed
E0712 08:46:49.149857 1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta1" audit-ID="7d8e9fdb-3f23-4fa8-952a-8f6d3daeb258"
E0712 08:46:49.150521 1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta2" audit-ID="ed232e12-b6e4-4f1e-a407-73c65c7d23ab"
E0712 08:46:49.152893 1 timeout.go:135] post-timeout activity - time-elapsed: 4.392273ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>
E0712 08:46:49.156903 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0712 08:46:49.157252 1 timeout.go:135] post-timeout activity - time-elapsed: 7.365381ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>
E0712 08:46:49.163731 1 writers.go:117] apiserver was unable to write a JSON response: http2: stream closed
E0712 08:46:49.163798 1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta1" audit-ID="3ff962e1-5437-4282-a35a-5c5d7e5e5c5a"
E0712 08:46:49.164040 1 writers.go:117] apiserver was unable to write a JSON response: http2: stream closed
E0712 08:46:49.164435 1 writers.go:117] apiserver was unable to write a JSON response: http2: stream closed
E0712 08:46:49.164811 1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta2" audit-ID="b69d7b5d-fef9-4ed0-9de7-1474505b516e"
E0712 08:46:49.165787 1 writers.go:117] apiserver was unable to write a JSON response: http2: stream closed
E0712 08:46:49.166040 1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta1" audit-ID="7afb8bb4-c92e-4004-9a9e-d5573060011e"
E0712 08:46:49.166264 1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E0712 08:46:49.166924 1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta1" audit-ID="fd644be2-e351-4ad3-8c09-7018cda0c978"
E0712 08:46:49.167584 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0712 08:46:49.171427 1 wrap.go:54] timeout or abort while handling: method=GET URI="/apis/custom.metrics.k8s.io/v1beta2" audit-ID="6494d9df-d5bf-402e-ae48-58b130b11625"
E0712 08:46:49.171478 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0712 08:46:49.243937 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0712 08:46:49.243949 1 writers.go:117] apiserver was unable to write a JSON response: http: Handler timeout
E0712 08:46:49.246406 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http2: stream closed"}: http2: stream closed
E0712 08:46:49.248929 1 timeout.go:135] post-timeout activity - time-elapsed: 98.366807ms, GET "/apis/custom.metrics.k8s.io/v1beta2" result: <nil>
E0712 08:46:49.251310 1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E0712 08:46:49.253716 1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E0712 08:46:49.254860 1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E0712 08:46:49.255997 1 status.go:71] apiserver received an error that is not an metav1.Status: &errors.errorString{s:"http: Handler timeout"}: http: Handler timeout
E0712 08:46:49.257172 1 writers.go:130] apiserver was unable to write a fallback JSON response: http2: stream closed
E0712 08:46:49.259616 1 timeout.go:135] post-timeout activity - time-elapsed: 95.781565ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>
E0712 08:46:49.260765 1 timeout.go:135] post-timeout activity - time-elapsed: 94.56363ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>
E0712 08:46:49.261896 1 timeout.go:135] post-timeout activity - time-elapsed: 95.660411ms, GET "/apis/custom.metrics.k8s.io/v1beta2" result: <nil>
E0712 08:46:49.262878 1 writers.go:130] apiserver was unable to write a fallback JSON response: http: Handler timeout
E0712 08:46:49.264115 1 timeout.go:135] post-timeout activity - time-elapsed: 96.887285ms, GET "/apis/custom.metrics.k8s.io/v1beta1" result: <nil>
E0712 08:46:49.268715 1 timeout.go:135] post-timeout activity - time-elapsed: 97.200859ms, GET "/apis/custom.metrics.k8s.io/v1beta2" result: <nil>
Are you using this custom metrics adapter or are you using another one? If you are using this one, are you also getting these errors? If not, how did you set that up?
from k8s-stackdriver.
I have just tried setting it up again with the different version number. And my items is still empty unfortunately.
As I mention above, I'm pretty sure it's the google stackdriver adapter that's causing me trouble now. I have set it up as I said and I am not getting 403 forbidden errors anymore, like I did in the beginning. But as the errors above suggest, then there are still something wrong with the adapter. It is getting <nil>
when fetching the custom metrics. I have no idea what is going on.
When running the command:
kubectl get --raw "/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/*/phpfpm_active_processes"
The result I get back is:
{"kind":"MetricValueList","apiVersion":"custom.metrics.k8s.io/v1beta1","metadata":{"selfLink":"/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/phpfpm_active_processes"},"items":[]}
And my containers are running in the default namespace.
Is there a command to check all namespaces for that metric? Just for fun. I'm still new to k8s and trying to learn as much as I can. I have no idea at this point what can be causing issues other than the stackdriver. And I'm kinda over it at this point.
from k8s-stackdriver.
I have a suspicion that there might be some issues when the phpfpm
data is being sent to Metrics Explorer. I was getting the same error message similar to the below:
stackdriver.go:60] Error while sending request to Stackdriver json: error calling MarshalJSON for type *monitoring.CreateTimeSeriesRequest: json: error calling MarshalJSON for type *monitoring.TimeSeries: json: error calling MarshalJSON for type *monitoring.Point: json: error calling MarshalJSON for type *monitoring.TypedValue: json: error calling MarshalJSON for type *monitoring.Distribution: json: unsupported value: NaN
My PHP FPM metrics were not being sent at all to Metrics Explorer but after stepping through the code here -
k8s-stackdriver/prometheus-to-sd/translator/stackdriver.go
Lines 54 to 55 in 0463e9b
ts
elements of the slice with the problematic Distribution
values I could get the phpfpm
metrics to work:
defer wg.Done()
var timeSeries []*v3.TimeSeries
for _, singleTimeSeries := range ts[begin:end] {
anyDistributionValuesFound := false
for _, point := range singleTimeSeries.Points {
if point.Value.DistributionValue != nil {
anyDistributionValuesFound = true
}
}
if !anyDistributionValuesFound {
timeSeries = append(timeSeries, singleTimeSeries)
}
}
req := &v3.CreateTimeSeriesRequest{TimeSeries: timeSeries}
What I assume is happening is further down the line when this bulk ts []*v3.TimeSeries
is being sent, depending on what element in the slice the phpfpm
metrics are they will or will not be sent as it will error out once a bad Distribution
value is to be sent thus not sending all the remaining metrics.
This is by no mean a fix for the underlying problem but just an observation / quick fix I was able to put in place - I assume these issues with the Distribution
metrics are happening upstream somewhere.
from k8s-stackdriver.
Related Issues (20)
- Stackdriver metrics are not supported by HorizontalPodAutoscaler HOT 2
- ClusterRole in stackdriver adapter has namespace defined
- custom-metrics-stackdriver-adapter not working - auth problems? HOT 13
- Security Policy violation Binary Artifacts HOT 5
- Allow custom metrics from a different pod HOT 3
- Custom metrics adapter spewing errors "apiserver was unable to write a fallback JSON response: http2: stream closed" HOT 11
- custom-metrics-stackdriver-adapter - couldn't get resource list for external.metrics.k8s.io/v1beta1: Got empty response for: external.metrics.k8s.io/v1beta1 HOT 3
- istio.io/service/server/response_latencies metric that HPA collected is different to Cloud monitoring HOT 1
- Filtering metrics by labelSelector in external.metrics.k8s.io api doesn't work HOT 1
- Timeout error logs HOT 5
- Deploying adapter_new_resource_model.yaml results in OOMKilled HOT 3
- 100% memory and CPU and never recovers HOT 2
- Documentation bugs in custom-metrics-stackdriver-adapter README HOT 3
- Error while sending request to Stackdriver googleapi: Error 503 HOT 17
- Tracing custom-metrics-stackdriver-adapter trace logging enabled HOT 6
- wrong version of file in tag HOT 4
- Unable to authenticate the request err="verifying certificate failed: x509: certificate signed by unknown authority"" HOT 4
- Custom log-based metric not recognized by HPA HOT 6
- Dependency Dashboard
- Changing port number might bring downtime in custom-metrics-stackdriver-adapter HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from k8s-stackdriver.