Horizontal Pod Autoscaler built with predictive abilities using statistical models

License: Apache License 2.0

Makefile 0.85% Dockerfile 0.64% Go 89.63% Python 8.88%

predictive-analytics kubernetes autoscaler horizontal-pod-autoscaler predictions statistical-models replicas autoscaling go golang

predictive-horizontal-pod-autoscaler's People

Contributors

Stargazers

Watchers

predictive-horizontal-pod-autoscaler's Issues

hook.Definition.Shell have no option to send diagnostics/logging to stderr when hook is successful.

Is your feature request related to a problem? Please describe.
When a shell hook is executed all logging redirected to stderr (e.g. log.Printf etc) is discarded in case hook was successful.

Describe the solution you'd like
Example of hook which does allow for logging a stderr output:
https://pkg.go.dev/github.com/jthomperoo/custom-pod-autoscaler/v2/config#Shell

Set default values of configuration to match HPA

Set default values of downscaleStabilization, tolerance, initialReadinessDelay, and cpuInitializationPeriod to match HPA default values.

Hi, I see this error in cpa pod logs: "Shell command failed, stderr: 2022/01/30 13:29:20 invalid metrics (1 invalid out of 1), first error is: failed to get resource metric: unable to get metrics for resource cpu: no metrics returned from resource metrics API"

Steps to reproduce the behavior:

Deploy latest phpa

Additional context
Can you please check if you have this error in your logs and if it really affects the phpa?

Unable to use the tuning service with a week of data

Describe the bug
We made load tests with a seasonal period of 240 (an hour) and it works great, better than expected.
But when modifying it to 5760 (a day) and a season of between 1-4, the tuning service returns 414 - Url too long.

Predictive Horizontal Pod Autoscaler Version
v0.13.0

To Reproduce
Steps to reproduce the behavior:

Install predictive-horizontal-pod-autoscaler crd
Set the PredictiveHorizontalPodAutoscaler manifest like so:

apiVersion: jamiethompson.me/v1alpha1
kind: PredictiveHorizontalPodAutoscaler
metadata:
  labels:
    app.kubernetes.io/instance: <my-pod-instance>
  name: <my-pod-instance>
  namespace: default
spec:
  behavior:
    scaleDown:
      policies:
        - periodSeconds: 330
          type: Percent
          value: 10
      stabilizationWindowSeconds: 300
  maxReplicas: 200
  metrics:
    - resource:
        name: cpu
        target:
          averageUtilization: 40
          type: Utilization
      type: Resource
  minReplicas: 40
  models:
    - holtWinters:
        alpha: 0.9
        beta: 0.9
        gamma: 0.9
        runtimeTuningFetchHook:
          http:
            method: GET
            parameterMode: query
            successCodes:
              - 200
            url: 'http://tuning.default.svc.cluster.local/holt_winters'
          timeout: 2500
          type: http
        seasonal: additive
        seasonalPeriods: 240
        storedSeasons: 3
        trend: additive
      name: HoltWintersPrediction
      type: HoltWinters
  scaleTargetRef:
    apiVersion: argoproj.io/v1alpha1
    kind: Rollout
    name: <my-pod-instance>
  syncPeriod: 30000

Install the helm chart containing the deployment and the tuning service
See the 414 - URL too long error code

Expected behavior
Expected the tuning service to make it's magic upon a week of data (to implement it into production). Instead it only gets an hour worth of data.

Kubernetes Details (kubectl version):
v1.22

Additional context
We need this feature because we have large peaks (from 20 pods to 100) at a specific time.

Add documentation

Documentation as code should be provided, with a configuration reference and user guides.

holt winters algorithm db: account for replica evaluations not in db

Is your feature request related to a problem? Please describe.
today evaluations which are made while (dbModel.IntervalsPassed < model.PerInterval) are not represented in db (skipped).
since holt-winters is a runtime-expensive operation (especially when seasonality period is as high as a week), model.PerInterval can be quite high (e.g. model.tickInterval = 30
sec,model.PerInterval = >15 min / model.tickInterval ). This leads to information missing from the model.

Describe the solution you'd like
while it is important to only run algorithm at sufficiently large intervals of time, the missing evaluations should be represented in db statistically.

Describe alternatives you've considered
extend db model to keep evaluations in between. once a run iteration kicks in (isRunInterval && isRunType == true)
apply some kind of filter over amassed evaluations, update db with the result and dismiss the evaluations.
Thus, only limited number of evaluations (sec,model.PerInterval-1) will be added to db in addition to model.storedSeasons
types of filter may include:

mean
max
median
random walk (actually, this is what done today)
run a script
etc

Additional context
Add any other context or screenshots about the feature request here.

Convert from being a Custom Pod Autoscaler to being controller based

Is your feature request related to a problem? Please describe.
The Custom Pod Autoscaler approach used in this project has been really useful for getting a working version ready and out the door, but now it would be better to mature this project by converting this to being an operator focused autoscaler.

This would mean that the PHPA would have its own CRD, which would be managed by a single controller which handles all of the scaling logic for every PHPA.

Investigate using Python for statistical algorithms

Transitioning from Go to Python could open up many more algorithms for the PHPA to use without having to reimplement in Go.

Suggested in this comment: #27 (comment)

holt winters python 3-8 algorithm code fails due to outdated dependencies of statsmodel 0.12.1

Describe the bug
running holt winters model errors out with:
Cannot import name '_centered' from 'scipy.signal.signaltools'

To Reproduce
Steps to reproduce the behavior:

Deploy latest version of phpa
Run holt-winters example
See error as described above

Expected behavior
holt winters should run to completion without errors

Kubernetes Details (kubectl version):
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:38:33Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:32:32Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
predictive-horizontal-pod-autoscaler commit a8f581d

Additional context
modifying docker image to use cpa python 3-7 instead 3-8 solves the issue, however this is just a workaround.
upgrading to latest statsmodels/scipy or both should solve the issue

Update to CPA 0.11.0 to add downscale stabilization

Update to CPA v0.11.0, requires updating CPA-HPA.

Depends on jthomperoo/horizontal-pod-autoscaler#14

[v0.11.0] - 2020-02-28

Added

Series of hooks for injecting user logic throughout the execution process.
- preMetric - Runs before metric gathering, given metric gathering input.
- postMetric - Runs after metric gathering, given metric gathering input and result.
- preEvaluate - Runs before evaluation, given evaluation input.
- postEvaluate - Runs after evaluation, given evaluation input and result.
- preScale - Runs before scaling decision, given min and max replicas, current replicas, target replicas, and resource being scaled.
- postScale - Runs before scaling decision, given min and max replicas, current replicas, target replicas, and resource being scaled.
New downscaleStabilization option, based on the Horizontal Pod Autoscaler downscale stabilization, operates by taking the maximum target replica count over the stabilization window.

Changed

Metrics from API now returns the entire resource definition as JSON rather than just the resource name.
Changed JSON generated to be in camelCase rather than snake_case for consistency with the Kubernetes API.
- Evaluation now uses targetReplicas over target_replicas.
- ResourceMetric now uses runType over run_type.
- Scale hook now provided with minReplicas, maxReplicas, currentReplicas and targetReplicas rather than their snakecase equivalents.
Metric gathering and hooks have access to dryRun field, allowing them to determine if they are called as part of a dry run.
Standardised input to metric gatherer, evaluator and scaler to take specs rather than lists of parameters, allowing easier serialisation for hooks.
Endpoint /api/v1/metrics now accepts the optional dry_run parameter for marking metric gathering as in dry run mode.
ResourceMetrics replaced with a list of Metric and a Resource.
/api/v1/metrics now simply returns a list of Metrics rather than a ResourceMetrics.

Removed

ResourceMetrics struct removed as it was redundant.

Support custom metrics retrieval from a custom controller (CRD based) with per-resource mode

Describe the bug
metrics retrieval works fine with predefined resource types (such as deployment).
however, switching to custom controller with crd fails to retrieve metrics with error:
2022/04/05 19:10:52 no kind "MyCC" is registered for version "MyCC.com/v1" in scheme "/app/main.go:121"
Mind, that custom-pod-autoscaler which has recently underwent code refactor succeeds in resolving the CRD type and retrieves scale object.

To Reproduce
Steps to reproduce the behavior:

Deploy custom controller
Configure phpa to reference it via scaleTargetRef
Deploy the phpa, rleevant clusterrole+clusterrolebinding for accessing the crd.
wait until error appears in logs; phpa fails to get metric
See error

Expected behavior
PHPA instance should be able to retrieve metrics just as running kubectl get --raw /api/... does.

Additional context
Add any other context about the problem here.

Regarding history size

Hi friend, I was testing PHPA in one of the EKS clusters and noticed that the history size for the linear regression model is being displayed. I have a few questions regarding this. I would appreciate it if you could provide some details. Thank you.

Does PHPA save the history size for the linear regression model in memory, or does it use a config map to store it?
In the event of software reinstallation, how can the old data be retrieved?
When I run 'kubectl get HPA', the HPA details are not displayed. In this case, how can we determine the current usage of pod and memory?
@jthomperoo

Refactor to use jthomperoo/k8shorizmetrics for metric gathering and evaluation

Depending on jthomperoo/horizontal-pod-autoscaler is clunky, it would be better to directly use this library for simplicity.

phpa scaledown behavior

Is your feature request related to a problem? Please describe.
Hi, I'm trying to set a scaling down behavior which is available in api autoscaling/v2beta2, I tried to add it in the phpa but it's not working so I assume its built on top hpa version autoscaling/v1.

Describe the solution you'd like
can you please update the phpa operator so it will allow using the scaling policies?

Describe alternatives you've considered
I tried to add the behavior as an annotation in phpa using "autoscaling.alpha.kubernetes.io/behavior" annotation but unfortunately it's not stable.

Include HPA calculation as part of predictive decision making calculation

At the minute the actual evaluation value calculated by the HPA logic is only used in a final comparison with decided upon predicted value, and it is used when it is greater than the predicted value. This lacks flexibility and can defeat the utility of the mean, median and minimum decision types.

The HPA evaluation should be accounted for in the mean, median, minimum, and maximum decision types, to allow for more consistent and flexible decision making.

Update examples to fix provisioning error

As pointed out in #32 the examples in this repository have a mistake, and will not work.

Include metric specs in the predictive config

It is confusing to have two separate configuration areas, the metric specs should be included in the predictive configuration.

Add unit tests to fake package

Fake package requires unit tests.

Custom Pod Autoscaler does not integrate with Argo Rollouts

Describe the bug
custom pod autoscaler does not integrate with argo rollouts.

To Reproduce
using argo rollouts

Expected behavior
I added rollouts under rules.resources and added argoproj.io under rules.apiGroups in Role I thought it would fix the issue, but still I got: 'no kind "Rollout" is registered for version "argoproj.io/v1alpha1" ' in custom pod autoscaler logs.

Kubernetes Details (kubectl version):
v1.19.8

Additional context
can you please add argo rollouts support?

Add 'Getting Started' guide

A simple Getting Started guide would be useful, showing a really simple use case of the PHPA - most likely using a linear regression.

Linear regression model fails on first run

Describe the bug
When using the linear regression model, once the first replica has been calculated the model will output a stack trace for the first run:

I0405 22:48:02.973332       1 shell.go:84] Shell command failed, stderr: 2021/04/05 22:48:02 exit status 1: Traceback (most recent call last):
  File "/app/algorithms/linear_regression/linear_regression.py", line 131, in <module>
    print(math.ceil(model.predict([[1, 0]])[0]), end="")
  File "/usr/local/lib/python3.8/site-packages/statsmodels/base/model.py", line 1099, in predict
    predict_results = self.model.predict(self.params, exog, *args,
  File "/usr/local/lib/python3.8/site-packages/statsmodels/regression/linear_model.py", line 380, in predict
    return np.dot(exog, params)
  File "<__array_function__ internals>", line 5, in dot
ValueError: shapes (1,2) and (1,) not aligned: 2 (dim 1) != 1 (dim 0)

To Reproduce
Steps to reproduce the behavior:

Run the example in examples/simple-linear

Expected behavior
A stack trace should not be dumped out here, instead it should skip this first run and rely solely on the calculated target value.

There should be a minimum length check added to https://github.com/jthomperoo/predictive-horizontal-pod-autoscaler/blob/master/internal/prediction/linear/linear.go#L56 that if there is too few data points to do a linear regression with (less than 2) then it should skip the prediction and just use the calculated value.

Kubernetes Details (kubectl version):
v1.20.0

Unable to get metrics for resource CPU metrics on EKS

Hi,

Unfortunately I'm really struggling to work out why this won't pick up metrics for my service. Even with logVerbosity: 3 I can't get any useful logs out. Any idea what I'm doing wrong?

I'm on Amazon EKS with K8S Version v1.16.8-eks-e16311and Metrics Server v0.3.7.

I've verified it isn't permissions. Binding cluster-admin to the scaler pod doesn't seem to help and I get a different error when permissions are missing.

Logs:

I0825 10:58:20.014351      15 metric.go:76] Gathering metrics in per-resource mode
I0825 10:58:20.016279      15 metric.go:94] Attempting to run metric gathering logic
I0825 10:58:20.057419      15 shell.go:80] Shell command failed, stderr: 2020/08/25 10:58:20 invalid metrics (1 invalid out of 1), first error is: failed to get resource metric: unable to get metrics for resource cpu: no metrics returned from resource metrics API
E0825 10:58:20.057450      15 main.go:248] exit status 1

Metrics server is working because kubectl top works:

kubectl top pod | grep content-repo-cache
content-repo-cache-566c695fc8-d6zjr                               4m           912Mi           
content-repo-cache-566c695fc8-dg5jj                               40m          954Mi           
content-repo-cache-scaler                                         2m           7Mi

Here's my YAML:

apiVersion: rbac.authorization.k8s.io/v1
kind: Role
metadata:
  name: content-repo-cache-scaler
rules:
- apiGroups:
  - ""
  resources:
  - pods
  - replicationcontrollers
  - replicationcontrollers/scale
  verbs:
  - '*'
- apiGroups:
  - apps
  resources:
  - deployments
  - deployments/scale
  - replicasets
  - replicasets/scale
  - statefulsets
  - statefulsets/scale
  verbs:
  - '*'
- apiGroups:
  - metrics.k8s.io
  resources:
  - '*'
  verbs:
  - '*'
---
apiVersion: custompodautoscaler.com/v1
kind: CustomPodAutoscaler
metadata:
  name: content-repo-cache-scaler
spec:
  template:
    spec:
      containers:
      - name: content-repo-cache-scaler
        image: jthomperoo/predictive-horizontal-pod-autoscaler:v0.5.0
        imagePullPolicy: IfNotPresent
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: content-repo-cache
  provisionRole: true
  config:
    - name: minReplicas
      value: "3"
    - name: maxReplicas
      value: "32"
    - name: logVerbosity
      value: "3"
    - name: predictiveConfig
      value: |
        models:
        - type: HoltWinters
          name: HoltWintersPrediction
          perInterval: 1
          holtWinters:
            alpha: 0.9
            beta: 0.9
            gamma: 0.9
            seasonLength: 4320
            storedSeasons: 4
            method: "additive"
        decisionType: "maximum"
        metrics:
        - type: Resource
          resource:
            name: cpu
            target:
              type: Utilization
              averageUtilization: 80
    - name: interval
      value: "20000"
    - name: startTime
      value: "60000"
    - name: downscaleStabilization
      value: "600"

Add monitor mode, allowing a PHPA to be run without actually scaling anything to help with tuning

Is there a way to run this solution in a kind of a monitor mode? I want to see what are the predictions going out from the operator, but not apply it. This will allow to tune it smoothly without affecting any real applications. In general I want to build the graphs like you have in the examples to see how good my smoothing parameters work before I'll start applying to the real deployment.
I was also looking for any metrics that I can scrape from the operator to monitor it via prometheus. Is there any endpoint for it?

can we do predictive HPA & sudden spike auto scaling on EKS using this ?

Slow shutdown, when a pod is terminated it takes around ~30 secs for the PHPA to stop

Describe the bug
Slow shutdown, when a pod is terminated it takes around ~30 secs for the PHPA to stop. I think this is because of the entrypoint shell script that runs the PHPA in the docker image - the SIGTERM signal is not propagated onto the Custom Pod Autoscaler binary, it does not know it is shutting down so it keeps running. The 30 second wait is the grace period before K8s sends a SIGKILL which forces the PHPA to stop.

To Reproduce
Steps to reproduce the behavior:

Deploy a PHPA, for example simple-linear in the examples/ directory.
Wait until it is running with kubectl get pods.
Watch the PHPA logs, for example kubectl logs simple-linear-example --follow.
Delete the PHPA using , for example kubectl delete cpa simple-linear-example.
Watch the pods and PHPA logs, it takes around 30 seconds to terminate, and the logs do not print anything about shutting down.

Expected behavior
PHPA should shut down promptly, in line with a normal CPA, and should print out Shutting down... to the logs when the SIGTERM signal is received. As is expected and defined in the CPA here:
https://github.com/jthomperoo/custom-pod-autoscaler/blob/d00bb5f72d382ba07f9b23437e2f8ce844edd18c/cmd/custom-pod-autoscaler/main.go#L239-L247

Basically instead of the build/entrypoint.sh script starting the binary using:

/cpa/custom-pod-autoscaler

It should use:

exec /cpa/custom-pod-autoscaler

(As explained here: https://hynek.me/articles/docker-signals/)

Upgrade to Custom Pod Autoscaler v1.0.0

There is a new stable release of the Custom Pod Autoscaler, this project should use it:
https://github.com/jthomperoo/custom-pod-autoscaler/releases/tag/v1.0.0

Tie down global dependencies

Global dependencies for development should be tied down to specific versions (e.g. Golint).

Errors in phpa operator pod

Describe the bug
I have these two errors in the phpa operator pod:

ERROR failed to get predicted replica count,......,"error": "exit status 1: Traceback (most recent call last):\n File "/app/algorithms/holt_winters/holt_winters.py", line 92, in \n model = sm.ExponentialSmoothing(algorithm_input.series,\n File "/usr/local/lib/python3.8/site-packages/pandas/util/_decorators.py", line 207, in wrapper\n return func(*args, **kwargs)\n File "/usr/local/lib/python3.8/site-packages/statsmodels/tsa/holtwinters/model.py", line 292, in init\n self._initialize()\n File "/usr/local/lib/python3.8/site-packages/statsmodels/tsa/holtwinters/model.py", line 432, in _initialize\n return self._initialize_heuristic()\n File "/usr/local/lib/python3.8/site-packages/statsmodels/tsa/holtwinters/model.py", line 447, in _initialize_heuristic\n lvl, trend, seas = _initialization_heuristic(\n File "/usr/local/lib/python3.8/site-packages/statsmodels/tsa/exponential_smoothing/initialization.py", line 59, in _initialization_heuristic\n raise ValueError('Cannot compute initial seasonals using'\nValueError: Cannot compute initial seasonals using heuristic method with less than two full seasonal cycles in the data.\n"}
ERROR failed to update PHPA configmap

Predictive Horizontal Pod Autoscaler Version
The version of the Predictive Horizontal Pod Autoscaler the bug has been found on.
v0.11.1

To Reproduce
Steps to reproduce the behavior:

Deploy latest phpa version (v0.11.1)
Run with holt_winters model
See error

Expected behavior
A clear and concise description of what you expected to happen.
num of pods being predicted after two-three load tests without the errors described above

Kubernetes Details (kubectl version):
Kubernetes version, kubectl version etc.
v1.20

Additional context
Add any other context about the problem here.

Add ability to specify start time

Is your feature request related to a problem? Please describe.

In the move from PHPA v0.10.0 to v0.11.0 the PHPA was changed to no longer be based on the Custom Pod Autoscaler framework. This meant that there was no longer in built support for the startTime field, which was very useful for using season based models such as Holt-Winters (e.g. define a season as a single day, starting at midnight).

BREAKING CHANGE: Since no longer built as a CustomPodAutoscaler the startTime configuration is no longer available: https://custom-pod-autoscaler.readthedocs.io/en/latest/reference/configuration/#starttime.

Describe the solution you'd like
It would be good if this functionality was implemented directly in the PHPA again.

Since we probably still want a functional autoscaler before this start time, the only thing that could change would be that models would only start applying after the start time, but data would still be gathered and normal HPA operations would occur.

Describe alternatives you've considered
N/A

Additional context
N/A

Metric specs not supplied

hi，I use this component to do a test, only one pod is deployed, and the log shows：
15 shell.go:80] Shell command failed, stderr: 2020/06/10 08:30:53 Metric specs not supplied

This is the short part of your experiment, how can I solve it

thanks!

Does PHPA support custom and external metric as k8s HPA?

Currently HPA in Kubernetes supports various metrics sources: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale/#support-for-metrics-apis

Does PHPA support these type of metrics sources? Or only Resource metrics?

Add ARIMA time series forecasting model

An ARIMA time series option would be a useful model; there is some literature here that goes into detail explaining what it is and how it works.

Unfortunately I haven't seen any ARIMA implementations in Go, so this could end up being a big task.

Allow fetching Holt-Winters parameters at runtime

The alpha, beta, and gamma values for Holt-Winters are currently predetermined and set at configuration time. A useful feature would be to allow these parameters to be fetched/calculated at runtime - for example with a grid search.

Originally posted by @shubhamitc in #26 (comment)

Proposed solution is providing some kind of hook functionality that would allow configuration of an HTTP endpoint or a Python script to run to determine these values at runtime.

Add unit tests to stored package

Package stored for interacting with the SQLite3 DB requires unit tests.

Nil pointers when using Pod metric

Describe the bug

I'm exporting custom metrics using prometheu under type: Pod and am getting panics from nil pointers.

It's unclear if such metrics are supported here or if I'm just misconfiguring the spec. My HPA (autoscaling/v2beta1) is able to track this metric without issue on EKS

To Reproduce

apiVersion: custompodautoscaler.com/v1
kind: CustomPodAutoscaler
metadata:
  name: linear-cpa
spec:
  template:
    spec:
      containers:
        - name: cpa
          image: jthomperoo/predictive-horizontal-pod-autoscaler:latest
          imagePullPolicy: IfNotPresent
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: mything
  roleRequiresMetricsServer: true
  config:
    - name: minReplicas
      value: "1"
    - name: maxReplicas
      value: "3"
    - name: predictiveConfig
      value: |
        models:
        - type: Linear
          name: LinearPrediction
          perInterval: 1
          linear:
            lookAhead: 10000
            storedValues: 20
        decisionType: "mean"
        metrics:
        - type: Pods
          pods:
            metric:
              name: numSessions
            target:
              type: AverageValue
              averageValue: "75"
    - name: interval
      value: "10000"
    - name: downscaleStabilization
      value: "300"

I also tried using a different spec, since I know K8s has tweaked it a lot over the last few releases. Something like:

  metrics:
    - type: Pods
      pods:
        metricName: numSessions
        targetAverageValue: '75'

observed logs:

I1006 19:51:48.228280       1 shell.go:90] Shell command failed, stderr: panic: runtime error: invalid memory address or nil pointer dereference
[signal SIGSEGV: segmentation violation code=0x1 addr=0x10 pc=0x6528c2]

goroutine 1 [running]:
k8s.io/apimachinery/pkg/api/resource.(*Quantity).ScaledValue(0x0, 0xfffffffd, 0x0)
	/home/runner/work/predictive-horizontal-pod-autoscaler/predictive-horizontal-pod-autoscaler/vendor/k8s.io/apimachinery/pkg/api/resource/quantity.go:744 +0x22
k8s.io/apimachinery/pkg/api/resource.(*Quantity).MilliValue(...)
	/home/runner/work/predictive-horizontal-pod-autoscaler/predictive-horizontal-pod-autoscaler/vendor/k8s.io/apimachinery/pkg/api/resource/quantity.go:736
github.com/jthomperoo/horizontal-pod-autoscaler/evaluate/pods.(*Evaluate).GetEvaluation(0xc0001253b0, 0xc000000001, 0xc000390720, 0x14363e0)
	/home/runner/work/predictive-horizontal-pod-autoscaler/predictive-horizontal-pod-autoscaler/vendor/github.com/jthomperoo/horizontal-pod-autoscaler/evaluate/pods/pods.go:52 +0x3f
github.com/jthomperoo/horizontal-pod-autoscaler/evaluate.(*Evaluate).getEvaluation(0xc0004aaa80, 0xc000000001, 0xc000390720, 0x0, 0x0, 0x0)
	/home/runner/work/predictive-horizontal-pod-autoscaler/predictive-horizontal-pod-autoscaler/vendor/github.com/jthomperoo/horizontal-pod-autoscaler/evaluate/evaluate.go:119 +0xfe
github.com/jthomperoo/horizontal-pod-autoscaler/evaluate.(*Evaluate).GetEvaluation(0xc0004aaa80, 0xc00003c580, 0x1, 0x4, 0x10, 0x8, 0x8826d1)

Expected behavior
No nil pointers

Kubernetes Details (kubectl version):

Client Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.2", GitCommit:"092fbfbf53427de67cac1e9fa54aaa09a28371d7", GitTreeState:"clean", BuildDate:"2021-06-16T12:52:14Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"21+", GitVersion:"v1.21.2-eks-0389ca3", GitCommit:"8a4e27b9d88142bbdd21b997b532eb6d493df6d2", GitTreeState:"clean", BuildDate:"2021-07-31T01:34:46Z", GoVersion:"go1.16.5", Compiler:"gc", Platform:"linux/amd64"}

Additional context
Add any other context about the problem here.

Evaluations thrash inconsistently for holt winters - unlike HPA

Seems to be a strange bug in which the evaluation for holt-winters seems to be highly volatile.

Metrics gatherer fails to retrieve a custom metric for deployment provided given metric is of Pods type

Describe the bug
There is a deployment installed by helm.
Deployment object metadata labels are assigned helm specific label
Metric gatherer during querying custom metrics api server (prometheus adapter in this case) provides all available labels from deployment metadata. However, a pod instance managed by said deployment is unaware of metadata labels assigned to deployment by external agent (helm).
As a result custom metrics server returns empty list of metrics.

To Reproduce
Steps to reproduce the behavior:

Deploy a deployment using helm.
Create custom pod autoscaler targeting the deployment
Observe errors generated by cpa , for example:
I0331 23:06:53.676418 1 shell.go:90] Shell command failed, stderr: 2022/03/31 23:06:53 invalid metrics (1 invalid out of 1), first error is: failed to get pods metric: unable to get metric node_transcoder_gpu_decoder_score: unable to fetch metrics from custom metrics API: the server could not find the metric node_transcoder_gpu_decoder_score for pods
E0331 23:06:53.676462 1 main.go:289] exit status 1
metric server would return 404:
I0331 23:08:53.534868 1 httplog.go:104] "HTTP" verb="GET" URI="/apis/custom.metrics.k8s.io/v1beta1/namespaces/default/pods/%2A/node_transcoder_gpu_decoder_score?labelSelector=app.kubernetes.io%2Fmanaged-by%3DHelm%2Ccomponent%3Dtranscoding-agent" latency="11.1253ms" userAgent="predictive-horizontal-pod-autoscaler/v0.0.0 (linux/amd64) kubernetes/$Format" audit-ID="75d9766b-6b1d-422f-8d53-b7fce0d66e8e" srcIP="192.168.65.3:63400" resp=404

Expected behavior
CPA should not rely on arbitrary labels assigned in the process. Instead, it should use label selector which is a part of Pods
metric object.

Kubernetes Details (kubectl version):
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:38:33Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.5", GitCommit:"5c99e2ac2ff9a3c549d9ca665e7bc05a3e18f07e", GitTreeState:"clean", BuildDate:"2021-12-16T08:32:32Z", GoVersion:"go1.16.12", Compiler:"gc", Platform:"linux/amd64"}
phpa:
git rev-parse master
a8f581d
Additional context
Add any other context about the problem here.

Update to Custom Pod Autoscaler v2.0.0

This project should be updated to use the latest Custom Pod Autoscaler v2.0.0.

Likely blocked by jthomperoo/horizontal-pod-autoscaler/issues/22

Upgrade to custom pod autoscaler v0.10.0

[v0.10.0] - 2020-01-22

Added

Set up API to be versioned, starting with v1.
Can now manually trigger scaling through the API.
Added extra run_type flag, api_dry_run, for evaluations through the API in dry_run mode.
Added apiConfig to hold configuration for the REST API.
Added extra configuration options within apiConfig.
- enabled - allows enabling or disabling the API, default enabled (true).
- useHTTPS - allows enabling or disabling HTTPS for the API, default off (false).
- certFile - cert file to be used if HTTPS is enabled.
- keyFile - key file to be used if HTTPS is enabled.

Changed

The command for shell methods is now an array of arguments, rather than a string.
The /api/v1/evaluation endpoint now requires POST rather than GET.
The /api/v1/evaluation endpoint now accepts an optional parameter, dry_run. If dry_run is true the evaluation will be retrieved in a read-only manner, the scaling will not occur. If it is false, or not provided, the evaluation will be retrieved and then used to apply scaling to the target.
Moved port and host configuration options into the apiConfig settings.

[v0.9.0] - 2020-01-19

Added

Support for other entrypoints other than /bin/sh, can specify an entrypoint for the shell command method.
Add logging library glog to allow logging at levels of severity and verbosity.
Can specify verbosity level of logs via the logVerbosity configuration option.

Changed

Can scale ReplicaSets, ReplicationControllers and StatefulSets alongside Deployments.
ResourceMetrics fields have resourceName and resource rather than deploymentName and deployment. In JSON this means that only the resource name will be exposed via field resource.
Uses scaling API rather than manually adjusting replica count on resource.
Matches using match selector rather than incorrectly using resource labels and building a different selector.

ERROR Reconciler error

Describe the bug
A clear and concise description of what the bug is.
I have this error in the phpa operator pod:
/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:234
1.6817378505120962e+09 ERROR Reconciler error
{"controller": "predictivehorizontalpodautoscaler", "controllerGroup": "jamiethompson.me", "controllerKind": "PredictiveHorizontalPodAutoscaler", "predictiveHorizontalPodAutoscaler": {"name":"simple-linear","namespace":"default"}, "namespace": "default", "name": "simple-linear", "reconcileID": "9b4b2166-a51d-465e-aa4c-78d9f480f2d1", "error": "failed to update status of resource: PredictiveHorizontalPodAutoscaler.jamiethompson.me "simple-linear" is invalid: [status.scaleDownReplicaHistory: Invalid value: "null": status.scaleDownReplicaHistory in body must be of type array: "null", status.scaleUpReplicaHistory: Invalid value: "null": status.scaleUpReplicaHistory in body must be of type array: "null", status.scaleUpEventHistory: Invalid value: "null": status.scaleUpEventHistory in body must be of type array: "null", status.currentMetrics: Invalid value: "null": status.currentMetrics in body must be of type array: "null", status.scaleDownEventHistory: Invalid value: "null": status.scaleDownEventHistory in body must be of type array: "null"]"}

Predictive Horizontal Pod Autoscaler Version
The version of the Predictive Horizontal Pod Autoscaler the bug has been found on.
0.13.0

To Reproduce
Steps to reproduce the behavior:

Deploy 'phpa.yaml' and 'deployment.yaml'
Run 'kubectl logs -l name=predictive-horizontal-pod-autoscaler -f'
See error

Kubernetes Details (kubectl version):
Kubernetes version, kubectl version etc.
kubernetes 1.19.2

Additional context
Add any other context about the problem here.

jthomperoo / predictive-horizontal-pod-autoscaler Goto Github PK

predictive-horizontal-pod-autoscaler's People

Contributors

Stargazers

Watchers

Forkers

predictive-horizontal-pod-autoscaler's Issues

[v0.11.0] - 2020-02-28

Added

Changed

Removed

[v0.10.0] - 2020-01-22

Added

Changed

[v0.9.0] - 2020-01-19

Added

Changed

Recommend Projects

Recommend Topics

Recommend Org