Giter VIP home page Giter VIP logo

krr's Introduction

Product Name Screen Shot

Kubernetes Resource Recommendations Based on Historical Data

Get recommendations based on your existing data in Prometheus/Coralogix/Thanos/Mimir and more!

Installation . How KRR works . Slack Integration . Free KRR UI
Usage · Report Bug · Request Feature · Support
Like KRR? Please ⭐ this repository to show your support!

About The Project

Robusta KRR (Kubernetes Resource Recommender) is a CLI tool for optimizing resource allocation in Kubernetes clusters. It gathers pod usage data from Prometheus and recommends requests and limits for CPU and memory. This reduces costs and improves performance.

Data Integrations

Used to send data to KRR

View Instructions for: Prometheus, Thanos, Victoria Metrics, Google Managed Prometheus, Amazon Managed Prometheus, Azure Managed Prometheus, Coralogix,Grafana Cloud and Grafana Mimir

Reporting Integrations

Used to receive information from KRR

View instructions for: Seeing recommendations in a UI, Sending recommendations to Slack, Setting up KRR as a k9s plugin

Features

  • No Agent Required: Run a CLI tool on your local machine for immediate results. (Or run in-cluster for weekly Slack reports.)
  • Prometheus Integration: Get recommendations based on the data you already have
  • Explainability: Understand how recommendations were calculated with explanation graphs
  • Extensible Strategies: Easily create and use your own strategies for calculating resource recommendations.
  • Free SaaS Platform: See why KRR recommends what it does, by using the free Robusta SaaS platform.
  • Future Support: Upcoming versions will support custom resources (e.g. GPUs) and custom metrics.

How Much Can I Expect to Save with KRR?

According to a recent Sysdig study, on average, Kubernetes clusters have:

  • 69% unused CPU
  • 18% unused memory

By right-sizing your containers with KRR, you can save an average of 69% on cloud costs.

Read more about how KRR works

Difference with Kubernetes VPA

Feature 🛠️ Robusta KRR 🚀 Kubernetes VPA 🌐
Resource Recommendations 💡 ✅ CPU/Memory requests and limits ✅ CPU/Memory requests and limits
Installation Location 🌍 ✅ Not required to be installed inside the cluster, can be used on your own device, connected to a cluster ❌ Must be installed inside the cluster
Workload Configuration 🔧 ✅ No need to configure a VPA object for each workload ❌ Requires VPA object configuration for each workload
Immediate Results ⚡ ✅ Gets results immediately (given Prometheus is running) ❌ Requires time to gather data and provide recommendations
Reporting 📊 ✅ Json, CSV, Markdown, Web UI, and more! ❌ Not supported
Extensibility 🔧 ✅ Add your own strategies with few lines of Python ⚠️ Limited extensibility
Explainability 📖 See graphs explaining the recommendations ❌ Not supported
Custom Metrics 📏 🔄 Support in future versions ❌ Not supported
Custom Resources 🎛️ 🔄 Support in future versions (e.g., GPU) ❌ Not supported
Autoscaling 🔀 🔄 Support in future versions ✅ Automatic application of recommendations
Default History 🕒 14 days 8 days
Supports HPA 🔥 ✅ Enable using --allow-hpa flag ❌ Not supported

Installation

Requirements

KRR requires Prometheus 2.26+, kube-state-metrics & cAdvisor.

Which metrics does KRR need? No setup is required if you use kube-prometheus-stack or Robusta's Embedded Prometheus.

If you have a different setup, make sure the following metrics exist:

  • container_cpu_usage_seconds_total
  • container_memory_working_set_bytes
  • kube_replicaset_owner
  • kube_pod_owner
  • kube_pod_status_phase

Note: If one of last three metrics is absent KRR will still work, but it will only consider currently-running pods when calculating recommendations. Historic pods that no longer exist in the cluster will not be taken into consideration.

Installation Methods

Brew (Mac/Linux)
  1. Add our tap:
brew tap robusta-dev/homebrew-krr
  1. Install KRR:
brew install krr
  1. Check that installation was successful:
krr --help
  1. Run KRR (first launch might take a little longer):
krr simple
Windows

You can install using brew (see above) on WSL2, or install from source (see below).

Airgapped Installation (Offline Environments)

You can download pre-built binaries from Releases or use the prebuilt Docker container. For example, the container for version 1.8.3 is:

us-central1-docker.pkg.dev/genuine-flight-317411/devel/krr:v1.8.3

We do not recommend installing KRR from source in airgapped environments due to the headache of installing Python dependencies. Use one of the above methods instead and contact us (via Slack, GitHub issues, or email) if you need assistance.

From Source
  1. Make sure you have Python 3.9 (or greater) installed
  2. Clone the repo:
git clone https://github.com/robusta-dev/krr
  1. Navigate to the project root directory (cd ./krr)
  2. Install requirements:
pip install -r requirements.txt
  1. Run the tool:
python krr.py --help

Notice that using source code requires you to run as a python script, when installing with brew allows to run krr. All above examples show running command as krr ..., replace it with python krr.py ... if you are using a manual installation.

Additional Options

Environment-Specific Instructions

Setup KRR for...

(back to top)

Trusting custom Certificate Authority (CA) certificate:

If your llm provider url uses a certificate from a custom CA, in order to trust it, base-64 encode the certificate, and store it in an environment variable named CERTIFICATE

Free KRR UI on Robusta SaaS

We highly recommend using the free Robusta SaaS platform. You can:

  • Understand individual app recommendations with app usage history

  • Sort and filter recommendations by namespace, priority, and more

  • Give devs a YAML snippet to fix the problems KRR finds

  • Analyze impact using KRR scan history

Usage

Basic usage
krr simple
Tweak the recommendation algorithm (strategy)

Most helpful flags:

  • --cpu-min Sets the minimum recommended cpu value in millicores
  • --mem-min Sets the minimum recommended memory value in MB
  • --history_duration The duration of the Prometheus history data to use (in hours)

More specific information on Strategy Settings can be found using

krr simple --help
Giving an Explicit Prometheus URL

If your Prometheus is not auto-connecting, you can use kubectl port-forward for manually forwarding Prometheus.

For example, if you have a Prometheus Pod called kube-prometheus-st-prometheus-0, then run this command to port-forward it:

kubectl port-forward pod/kube-prometheus-st-prometheus-0 9090

Then, open another terminal and run krr in it, giving an explicit Prometheus url:

krr simple -p http://127.0.0.1:9090
Run on specific namespaces

List as many namespaces as you want with -n (in this case, default and ingress-nginx)

krr simple -n default -n ingress-nginx

See example ServiceAccount and RBAC permissions

Run on workloads filtered by label

Use a label selector

python krr.py simple --selector 'app.kubernetes.io/instance in (robusta, ingress-nginx)'
Override the kubectl context

By default krr will run in the current context. If you want to run it in a different context:

krr simple -c my-cluster-1 -c my-cluster-2
Output formats for reporting (JSON, YAML, CSV, and more)

Currently KRR ships with a few formatters to represent the scan data:

  • table - a pretty CLI table used by default, powered by Rich library
  • json
  • yaml
  • pprint - data representation from python's pprint library
  • csv - export data to a csv file in the current directory

To run a strategy with a selected formatter, add a -f flag. Usually this should be combined with --fileoutput <filename> to write clean output to file without logs:

krr simple -f json --fileoutput krr-report.json

If you prefer, you can also use --logtostderr to get clean formatted output in one file and error logs in another:

krr simple --logtostderr -f json > result.json 2> logs-and-errors.log
Centralized Prometheus (multi-cluster)

See below on filtering output from a centralized prometheus, so it matches only one cluster

Prometheus Authentication

KRR supports all known authentication schemes for Prometheus, VictoriaMetrics, Coralogix, and other Prometheus compatible metric stores.

Refer to krr simple --help, and look at the flags --prometheus-url, --prometheus-auth-header, --prometheus-headers --prometheus-ssl-enabled, --coralogix-token, and the various --eks-* flags.

If you need help, contact us on Slack, email, or by opening a GitHub issue.

Debug mode If you want to see additional debug logs:
krr simple -v

(back to top)

How KRR works

Metrics Gathering

Robusta KRR uses the following Prometheus queries to gather usage data:

  • CPU Usage:

    sum(irate(container_cpu_usage_seconds_total{{namespace="{object.namespace}", pod="{pod}", container="{object.container}"}}[{step}]))
    
  • Memory Usage:

    sum(container_memory_working_set_bytes{job="kubelet", metrics_path="/metrics/cadvisor", image!="", namespace="{object.namespace}", pod="{pod}", container="{object.container}"})
    

Need to customize the metrics? Tell us and we'll add support.

Get a free breakdown of KRR recommendations in the Robusta SaaS.

Algorithm

By default, we use a simple strategy to calculate resource recommendations. It is calculated as follows (The exact numbers can be customized in CLI arguments):

  • For CPU, we set a request at the 95th percentile with no limit. Meaning, in 95% of the cases, your CPU request will be sufficient. For the remaining 5%, we set no limit. This means your pod can burst and use any CPU available on the node - e.g. CPU that other pods requested but aren’t using right now.

  • For memory, we take the maximum value over the past week and add a 15% buffer.

Prometheus connection

Find about how KRR tries to find the default Prometheus to connect here.

(back to top)

Data Source Integrations

Prometheus, Victoria Metrics and Thanos auto-discovery

By default, KRR will try to auto-discover the running Prometheus Victoria Metrics and Thanos. For discovering Prometheus it scans services for those labels:

"app=kube-prometheus-stack-prometheus"
"app=prometheus,component=server"
"app=prometheus-server"
"app=prometheus-operator-prometheus"
"app=rancher-monitoring-prometheus"
"app=prometheus-prometheus"

For Thanos its these labels:

"app.kubernetes.io/component=query,app.kubernetes.io/name=thanos",
"app.kubernetes.io/name=thanos-query",
"app=thanos-query",
"app=thanos-querier",

And for Victoria Metrics its the following labels:

"app.kubernetes.io/name=vmsingle",
"app.kubernetes.io/name=victoria-metrics-single",
"app.kubernetes.io/name=vmselect",
"app=vmselect",

If none of those labels result in finding Prometheus, Victoria Metrics or Thanos, you will get an error and will have to pass the working url explicitly (using the -p flag).

(back to top)

Scanning with a Centralized Prometheus

If your Prometheus monitors multiple clusters we require the label you defined for your cluster in Prometheus.

For example, if your cluster has the Prometheus label cluster: "my-cluster-name", then run this command:

krr.py simple --prometheus-label cluster -l my-cluster-name

You may also need the -p flag to explicitly give Prometheus' URL.

Azure Managed Prometheus

For Azure managed Prometheus you need to generate an access token, which can be done by running the following command:

# If you are not logged in to Azure, uncomment out the following line
# az login
AZURE_BEARER=$(az account get-access-token --resource=https://prometheus.monitor.azure.com  --query accessToken --output tsv); echo $AZURE_BEARER

Than run the following command with PROMETHEUS_URL substituted for your Azure Managed Prometheus URL:

python krr.py simple --namespace default -p PROMETHEUS_URL --prometheus-auth-header "Bearer $AZURE_BEARER"

See here about configuring labels for centralized prometheus

(back to top)

Google Managed Prometheus (GMP)

Please find the detailed GMP usage instructions here

(back to top)

Amazon Managed Prometheus

For Amazon Managed Prometheus you need to add your Prometheus link and the flag --eks-managed-prom and krr will automatically use your aws credentials

python krr.py simple -p "https://aps-workspaces.REGION.amazonaws.com/workspaces/..." --eks-managed-prom

Additional optional parameters are:

--eks-profile-name PROFILE_NAME_HERE # to specify the profile to use from your config
--eks-access-key ACCESS_KEY # to specify your access key
--eks-secret-key SECRET_KEY # to specify your secret key
--eks-service-name SERVICE_NAME # to use a specific service name in the signature
--eks-managed-prom-region REGION_NAME # to specify the region the Prometheus is in

See here about configuring labels for centralized prometheus

(back to top)

Coralogix Managed Prometheus

For Coralogix managed Prometheus you need to specify your Prometheus link and add the flag coralogix_token with your Logs Query Key

python krr.py simple -p "https://prom-api.coralogix..." --coralogix_token

See here about configuring labels for centralized prometheus

(back to top)

Grafana Cloud Managed Prometheus

For Grafana Cloud managed Prometheus you need to specify Prometheus link, Prometheus user, and an access token of your Grafana Cloud stack. The Prometheus link and user for the stack can be found on the Grafana Cloud Portal. An access token with a metrics:read scope can also be created using Access Policies on the same portal.

Next, run the following command, after setting the values of PROM_URL, PROM_USER, and PROM_TOKEN variables with your Grafana Cloud stack's Prometheus link, Prometheus user, and access token.

python krr.py simple -p $PROM_URL --prometheus-auth-header "Bearer ${PROM_USER}:${PROM_TOKEN}" --prometheus-ssl-enabled

See here about configuring labels for centralized prometheus

(back to top)

Grafana Mimir auto-discovery

By default, KRR will try to auto-discover the running Grafana Mimir.

For discovering Prometheus it scans services for those labels:

  "app.kubernetes.io/name=mimir,app.kubernetes.io/component=query-frontend"

(back to top)

Integrations

Free UI for KRR recommendations

We highly recommend using the free Robusta SaaS platform. You can:

  • Understand individual app recommendations with app usage history

  • Sort and filter recommendations by namespace, priority, and more

  • Give dev's a YAML snippet to fix the problems KRR finds

  • Analyze impact using KRR scan history

Slack Notification

Put cost savings on autopilot. Get notified in Slack about recommendations above X%. Send a weekly global report, or one report per team.

Slack Screen Shot

Prerequisites

  • A Slack workspace

Setup

  1. Install Robusta with Helm to your cluster and configure slack
  2. Create your KRR slack playbook by adding the following to generated_values.yaml:
customPlaybooks:
# Runs a weekly krr scan on the namespace devs-namespace and sends it to the configured slack channel
customPlaybooks:
- triggers:
  - on_schedule:
      fixed_delay_repeat:
        repeat: -1 # number of times to run or -1 to run forever
        seconds_delay: 604800 # 1 week
  actions:
  - krr_scan:
      args: "--namespace devs-namespace" ## KRR args here
  sinks:
      - "main_slack_sink" # slack sink you want to send the report to here
  1. Do a Helm upgrade to apply the new values: helm upgrade robusta robusta/robusta --values=generated_values.yaml --set clusterName=<YOUR_CLUSTER_NAME>

(back to top)

k9s Plugin

Install our k9s Plugin to get recommendations directly in deployments/daemonsets/statefulsets views.

Plugin: resource recommender

Installation instructions: k9s docs

Creating a Custom Strategy/Formatter

Look into the examples directory for examples on how to create a custom strategy/formatter.

(back to top)

Testing

We use pytest to run tests.

  1. Install the project manually (see above)
  2. Navigate to the project root directory
  3. Install poetry (https://python-poetry.org/docs/#installing-with-the-official-installer)
  4. Install dev dependencies:
poetry install --group dev
  1. Install robusta_krr as editable dependency:
pip install -e .
  1. Run the tests:
poetry run pytest

(back to top)

Contributing

Contributions are what make the open source community such an amazing place to learn, inspire, and create. Any contributions you make are greatly appreciated.

If you have a suggestion that would make this better, please fork the repo and create a pull request. You can also simply open an issue with the tag "enhancement". Don't forget to give the project a star! Thanks again!

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

(back to top)

License

Distributed under the MIT License. See LICENSE.txt for more information.

(back to top)

Support

If you have any questions, feel free to contact [email protected] or message us on robustacommunity.slack.com

(back to top)

krr's People

Contributors

aantn avatar arikalon1 avatar arnoldyahad avatar avi-robusta avatar bpfoster avatar chicocvenancio avatar clementgautier avatar cr7258 avatar dazwilkin avatar dgdevops avatar evertonsa avatar fenio avatar frankfoerster24 avatar ganeshrvel avatar haad avatar joaopedrocg27 avatar leavemyyard avatar mamykola avatar mrueg avatar pablos44 avatar pavangudiwada avatar reason2010 avatar roiglinik avatar serdarkkts avatar sheeproid avatar shlomosfez avatar tlipoca9 avatar vahan90 avatar whaakman avatar yonahd avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

krr's Issues

New installation method: asdf

Is your feature request related to a problem? Please describe.
Linux users have to install krr manually via python or with homebrew, ..sad times.

Describe the solution you'd like
asdf is the ultimate package manager for CLI tools (it's great, you should all be using it), I would love to see an official krr plugin for asdf

https://asdf-vm.com/

Describe alternatives you've considered
If krr can be published as a simple binary, that would also be nice.

Additional context

KRR install missing python modules aiostream and slack-sdk

After manually installing KRR https://github.com/robusta-dev/krr#manual-installation, I noticed the following modules were missing:

  • aiostream
  • slack-sdk

To Reproduce

  1. Follow manual install instructions
  2. See the error for slack-sdk. (if you manually install via pip the error goes away https://pypi.org/project/slack-sdk/)
  3. If step 2 is resolved using workaround you will then see the error for aiostream. (if you manually install via pip error goes away https://pypi.org/project/aiostream/)

Expected behavior
slack-sdk and aiostream installed properly via requirements.txt file.

Suggested Fix
Add these 2 dependencies slack-sdk and aiostream to the requirements.txt file

Error Messages

slack-sdk

python3 krr.py --help          
               
Traceback (most recent call last):
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/krr.py", line 1, in <module>
    from robusta_krr import run
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/robusta_krr/__init__.py", line 1, in <module>
    from .main import run
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/robusta_krr/main.py", line 16, in <module>
    from robusta_krr.core.runner import Runner
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/robusta_krr/core/runner.py", line 6, in <module>
    from slack_sdk import WebClient
ModuleNotFoundError: No module named 'slack_sdk'

aiostream

 python3 krr.py --help 
Traceback (most recent call last):
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/krr.py", line 1, in <module>
    from robusta_krr import run
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/robusta_krr/__init__.py", line 1, in <module>
    from .main import run
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/robusta_krr/main.py", line 16, in <module>
    from robusta_krr.core.runner import Runner
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/robusta_krr/core/runner.py", line 10, in <module>
    from robusta_krr.core.integrations.kubernetes import KubernetesLoader
  File "/Users/calvincarter/_programming/docker/development/krr_install/krr/robusta_krr/core/integrations/kubernetes.py", line 4, in <module>
    import aiostream
ModuleNotFoundError: No module named 'aiostream'

Desktop (please complete the following information):

  • OS: [Mac]
  • Version [Ventura 13.5]

Add support for DataDog metrics instead of Prometheus

Our organization no longer uses Prometheus, so I am very curious what would it take to integrate with DataDog as a metrics source, either as what they call an "integration" or just using plain old DataDog API.

Add long term storage support (Thanos)

We are currently just storing 24h in our local prometheus and using for anything longer thanos.

It would cool to use thanos by filtering results based on a custom label set.

Add support to Grafana cloud

Is your feature request related to a problem? Please describe.
We scrape the metrics using Grafana agent which sends data to Grafana cloud.

Describe the solution you'd like
It would be great if you could allow passing Grafana cloud Prometheus query endpoint with username and password.
something like below

krr -p https://prometheus-xxx.grafana.net/api/prom -u <username> -p <password>

Typo in setup for Azure Prometheus in README

In the README.md file there is

# If you are not logged in to Azure, uncomment out the following line
# az login
AZURE_BEARER=$(az account get-access-token --resource=https://prometheus.monitor.azure.com  --query accesssToken --output tsv); echo $AZURE_BEARER 

But there should be (see the --query parameter)

# If you are not logged in to Azure, uncomment out the following line
# az login
AZURE_BEARER=$(az account get-access-token --resource=https://prometheus.monitor.azure.com  --query accessToken --output tsv); echo $AZURE_BEARER 

Just a typo, but it wastes time of someone to troubleshoot, why the token does not get fetched.

Also the next command in the same sections, that utilizes the token, should be IMO

krr simple -p PROMETHEUS_URL --prometheus-auth-header "Bearer $AZURE_BEARER"

All other examples uses krr command directly and the the usage of namespace is missleading.

Add support for rollouts

Hey everyone,
We are mostly using argocd rollout resource instead of deployments:

apiVersion: argoproj.io/v1alpha1
kind: Rollout

The Rollout of ArgoCD is a popular CRD that is used by most of the people that use ArgoCD, its basically a deployment but with extra capabilities.

It would be great if the krr tool could support it so it will show results of Rollouts as well.

Using VictoriaMetrics leads `Connection reset by peer` after 2 minutes

Describe the bug
Using VictoriaMetrics leads Connection reset by peer after 2 minutes.

To Reproduce
Steps to reproduce the behavior:

  1. krr simple --verbose --prometheus-url https://vmselect.test.com/select/0/prometheus
  2. See error:
During handling of the above exception, another exception occurred:

╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /Users/furkan.turkal/robusta_krr/core/runner.py:202 in run                                       │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/robusta_krr/core/runner.py'           │
│                                                                                                  │
│ /Users/furkan.turkal/robusta_krr/core/runner.py:177 in _collect_result                           │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/robusta_krr/core/runner.py'           │
│                                                                                                  │
│ /Users/furkan.turkal/robusta_krr/core/runner.py:138 in _gather_objects_recommendations           │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/robusta_krr/core/runner.py'           │
│                                                                                                  │
│ /Users/furkan.turkal/robusta_krr/core/runner.py:113 in _calculate_object_recommendations         │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/robusta_krr/core/runner.py'           │
│                                                                                                  │
│ /Users/furkan.turkal/robusta_krr/core/integrations/prometheus/loader.py:97 in gather_data        │
│                                                                                                  │
│ [Errno 2] No such file or directory:                                                             │
│ '/Users/furkan.turkal/robusta_krr/core/integrations/prometheus/loader.py'                        │
│                                                                                                  │
│                                     ... 6 frames hidden ...                                      │
│                                                                                                  │
│ /Users/furkan.turkal/requests/sessions.py:635 in post                                            │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/requests/sessions.py'                 │
│                                                                                                  │
│ /Users/furkan.turkal/requests/sessions.py:587 in request                                         │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/requests/sessions.py'                 │
│                                                                                                  │
│ /Users/furkan.turkal/requests/sessions.py:745 in send                                            │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/requests/sessions.py'                 │
│                                                                                                  │
│ /Users/furkan.turkal/requests/models.py:899 in content                                           │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/requests/models.py'                   │
│                                                                                                  │
│ /Users/furkan.turkal/requests/models.py:818 in generate                                          │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/Users/furkan.turkal/requests/models.py'                   │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
ChunkedEncodingError: ("Connection broken: ConnectionResetError(54, 'Connection reset by peer')", ConnectionResetError(54, 'Connection reset by peer'))

Expected behavior
Maybe worth adding some resiliency with retries and timeouts.

Screenshots
-

Desktop (please complete the following information):

  • OS: macOS
  • Browser -
  • Version v1.4.0

error when context not set

Describe the bug
A clear and concise description of what the bug is.

i work with more than a couple of clusters. to avoid confusing myself i do not set contexts forcing myself to do that for each kubectl command i write.

which means i get this error:

kubernetes.config.config_exception.ConfigException: Invalid kube-config file. Expected object with name in ${HOME}/.kube/config/contexts list

i had to use kubectx to set context for krr to work

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior
A clear and concise description of what you expected to happen.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

brew install on linux references the wrong asset ...

$ brew install krr
==> Fetching robusta-dev/krr/krr
Error: krr: Failed to download resource "krr"
Failure while executing; `/usr/bin/env /home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/shims/shared/curl --disable --cookie /dev/null --globoff --show-error --user-agent Linuxbrew/4.0.27\ \(Linux\;\ x86_64\ Debian\ GNU/Linux\ 12\ \(bookworm\)\)\ curl/7.88.1 --header Accept-Language:\ en --retry 3 --fail --location --silent --head --request GET https://github.com/robusta-dev/krr/releases/download/v1.3.2/krr-linux-latest-v1.3.2.zip` exited with 22. Here's the output:
HTTP/2 404 
server: GitHub.com
date: Thu, 06 Jul 2023 15:17:29 GMT
content-type: text/plain; charset=utf-8
vary: X-PJAX, X-PJAX-Container, Turbo-Visit, Turbo-Frame, Accept-Encoding, Accept, X-Requested-With
cache-control: no-cache
strict-transport-security: max-age=31536000; includeSubdomains; preload
x-frame-options: deny
x-content-type-options: nosniff
x-xss-protection: 0
referrer-policy: no-referrer-when-downgrade
content-security-policy: default-src 'none'; base-uri 'self'; connect-src 'self'; form-action 'self'; img-src 'self' data:; script-src 'self'; style-src 'unsafe-inline'
content-length: 9
x-github-request-id: 4CF5:77F0:3E5E2C6:3F1ED1F:64A6DB09

curl: (22) The requested URL returned error: 404


Use Vertical Pod Autoscaler as an additional data source to enrich the recommendation algorithm

As current implementation of krr is highly relies on Prometheus metrics, I was thinking of how we can also get benefit of the VPA (in recommendation mode, if CRDs installed) as an additional data source to enrich recommendation system by making high precision calculations (as goldilocks did).

Implementation would quite simple:

  • Get the VPA
  • Check the VPA.Status.Recommendation.ContainerRecommendations
  • Get the Prometheus metrics (as-is)
  • Do some magic and calculation stuff (combine two different data sources)
  • Finalize the recommended values

Any thoughts?

Krr 1.5.3 returns no results due to no metrics for PercentileCPULoader and MaxMemoryLoader

Describe the bug

I updated the tool to the 1.5.3 version. When I executed the simple strategy and it returned no results:

> krr simple --namespace <namespace> --selector="app = grafana"


 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) v1.5.3
Using strategy: Simple
Using formatter: table

[INFO] Using clusters: ['<cluster>']
on 0: [INFO] Listing scannable objects in <cluster>
on 0: [INFO] Connecting to Prometheus for <cluster> cluster
on 0: [INFO] Using Prometheus at https://<server>/api/v1/namespaces/<namespace>/services/prometheus-server-service:9091/proxy for cluster <cluster>   
on 0: [INFO] Prometheus found
on 0: [INFO] Prometheus connected successfully for <cluster> cluster
on 0: [WARNING] Prometheus returned no PercentileCPULoader metrics for Deployment <namespace>/grafana-deployment/grafana
on 0: [WARNING] Prometheus returned no MaxMemoryLoader metrics for Deployment <namespace>/grafana-deployment/grafana
Calculating Recommendation |████████████████████████████████████████| 1 in 15.4s (0.07/s)



Simple Strategy

CPU request: 99.0% percentile, limit: unset
Memory request: max + 5.0%, limit: max + 5.0%

This strategy does not work with objects with HPA defined (Horizontal Pod Autoscaler).
If HPA is defined for CPU or Memory, the strategy will return "?" for that resource.

Learn more: https://github.com/robusta-dev/krr#algorithm

┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━┓
┃ Number ┃ Namespace        ┃ Name             ┃ Pods ┃ Old Pods ┃ Type       ┃ Container ┃ CPU Diff ┃ CPU Requests     ┃ CPU Limits       ┃ Memory Diff ┃ Memory Requests  ┃ Memory Limits     ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━┩
│     1. │ <namespace>      │ grafana-deploym… │ 0    │ 0        │ Deployment │ grafana   │          │ 40m -> ? (No     │ unset -> ? (No   │             │ 75Mi -> ? (No    │ 90Mi -> ? (No     │
│        │                  │                  │      │          │            │           │          │ data)            │ data)            │             │ data)            │ data)             │
└────────┴──────────────────┴──────────────────┴──────┴──────────┴────────────┴───────────┴──────────┴──────────────────┴──────────────────┴─────────────┴──────────────────┴───────────────────┘
                                                                                         100 points - A

When I execute the same command for the same cluster with the 1.4.1 version, it works fine:

> krr simple --namespace monitor-operator --selector="app = grafana"                                                                               


 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) v1.4.1
Using strategy: Simple
Using formatter: table

[INFO] Using clusters: ['<cluster>']
[INFO] Listing scannable objects in <cluster>
[INFO] Found 1 objects across 1 namespaces in <cluster>
on 0: [INFO] Connecting to Prometheus for <cluster> cluster
on 0: [INFO] Using Prometheus at https://<server>/api/v1/namespaces/<namespace>/services/prometheus-server-service:9091/proxy for cluster <cluster>   
on 0: [INFO] Prometheus found
on 0: [INFO] Prometheus connected successfully for <cluster>  cluster
Calculating Recommendation |████████████████████████████████████████| 1/1 [100%] in 5.5s (0.18/s)



Simple Strategy

CPU request: 99.0% percentile, limit: unset
Memory request: max + 5.0%, limit: max + 5.0%

This strategy does not work with objects with HPA defined (Horizontal Pod Autoscaler).
If HPA is defined for CPU or Memory, the strategy will return "?" for that resource.

Learn more: https://github.com/robusta-dev/krr#algorithm

┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓    
┃ Number ┃ Namespace        ┃ Name               ┃ Pods ┃ Old Pods ┃ Type       ┃ Container ┃ CPU Diff ┃ CPU Requests     ┃ CPU Limits ┃ Memory Diff ┃ Memory Requests      ┃ Memory Limits ┃    
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩    
│     1. │ <namespace>      │ grafana-deployment │ 1    │ 0        │ Deployment │ grafana   │ -35m     │ (-35m) 40m -> 5m │ unset      │ -15Mi       │ (-15Mi) 75Mi -> 60Mi │ 90Mi -> 60Mi  │    
└────────┴──────────────────┴────────────────────┴──────┴──────────┴────────────┴───────────┴──────────┴──────────────────┴────────────┴─────────────┴──────────────────────┴───────────────┘    
                                                                                       100 points - A    

To Reproduce
Steps to reproduce the behavior:

  1. execute the simple strategy only with the namespace and selector parameters

Expected behavior

Recommendations should be calculated.

Desktop

  • OS: Microsoft Windows 11 Enterprise, 10.0.22621
  • Browser: Brave, 1.56.20 Chromium: 115.0.5790.171 (Official Build) (64-bit)

Providing strategy settings on command line

Hi!

Thanks for creating this! I've been trying to use it, and for the most part it's been a very nice experience. However, I am having some issues with getting the strategy parameters to work.

I loaded up the project in VSCode and started the application (with a few breakpoints) with the following parameters:

krr.py simple -p "https://<my prometheus endpoint>" -n "selfhosted" --history_duration "1" --memory_buffer_percentage "10"

I notice that the values for history_duration and memory_buffer_percentage are still 336 and 5 respectively. Am I doing something wrong in the CLI call?

krr.py simple --help gives me this:

Usage: krr.py simple [OPTIONS]

 Run KRR using the `simple` strategy

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --help          Show this message and exit.                                                                                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Kubernetes Settings ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --cluster    -c      TEXT  List of clusters to run on. By default, will run on the current cluster. Use '*' to run on all clusters. [default: None]                              │
│ --namespace  -n      TEXT  List of namespaces to run on. By default, will run on all namespaces. [default: None]                                                                 │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Prometheus Settings ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --prometheus-url          -p      TEXT  Prometheus URL. If not provided, will attempt to find it in kubernetes cluster [default: None]                                           │
│ --prometheus-auth-header          TEXT  Prometheus authentication header. [default: None]                                                                                        │
│ --prometheus-ssl-enabled                Enable SSL for Prometheus requests.                                                                                                      │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Logging Settings ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --formatter  -f      TEXT  Output formatter (json, pprint, table, yaml) [default: table]                                                                                         │
│ --verbose    -v            Enable verbose mode                                                                                                                                   │
│ --quiet      -q            Enable quiet mode                                                                                                                                     │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Strategy Settings ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --history_duration                TEXT  The duration of the history data to use (in hours). [default: 336]                                                                       │
│ --timeframe_duration              TEXT  The step for the history data (in minutes). [default: 15]                                                                                │
│ --cpu_percentile                  TEXT  The percentile to use for the CPU recommendation. [default: 99]                                                                          │
│ --memory_buffer_percentage        TEXT  The percentage of added buffer to the peak memory usage for memory recommendation. [default: 5]                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Allow to run on specific namespaces with restricted permission

Describe the bug
On the cluster I used, I don't have access to all namespaces.
Even if I specify my namespaces, it seems it try to get the resources through a cluster-scoped api instead of a namespaced one.

To Reproduce
Steps to reproduce the behavior:

  1. Make sure you don't have access to all namespaces, ex :
$ kubectl get po -A
Error from server (Forbidden): pods is forbidden: User "u-wf3je4hm2h" cannot list resource "pods" in API group "" at the cluster scope
  1. krr simple -n my_namespace
  2. See error
Running Robusta's KRR (Kubernetes Resource Recommender) 1.0.0
Using strategy: Simple
Using formatter: table

[ERROR] Error trying to list pods in cluster k8s-prod: (403)
Reason: Forbidden
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"deployments.apps is forbidden: User \"u-wf3je4hm2h\" cannot list resource \"deployments\" in API group \"apps\" at the cluster
scope","reason":"Forbidden","details":{"group":"apps","kind":"deployments"},"code":403}

Expected behavior
It should be able to get the data if I have access to the specified namespace

Thank for krr, it's awesome :)

Feature request: configurable prometheus queries

We use kube-eagle and victoria metrics.

Right now, I have to set up a scrape job from cadvisor solely dedicated to krr:

  # These metrics are required by https://github.com/robusta-dev/krr
  - job_name: kubelet
    scheme: https
    tls_config:
      ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      insecure_skip_verify: true
    bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
    kubernetes_sd_configs:
      - role: node
    relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/$1/proxy/metrics/cadvisor
    metric_relabel_configs:
      # We only need the following metrics because they are needed by robusta-dev/krr, everything else we get from kube-eagle
      - action: keep
        if: '{__name__=~"(container_cpu_usage_seconds_total|container_memory_working_set_bytes)"}'

I think it would be an improvement if one could overwrite the queries mentioned in the docs to whatever fits the local setup.
Let users figure out the equivalent queries for tools like kube-eagle.

Too many historic pods causes querystring too long for Prometheus range_query API

Describe the bug

If we have many historic pods, the query expr will be very long. Then at some point, the request may be rejected by the gateway, e.g. Nginx. HTTP code 422 may be returned.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

Use POST API instead of GET.

https://github.com/4n4nd/prometheus-api-client-python/blob/39c5710521134fc450e9b4103cbb5995c05c5273/prometheus_api_client/prometheus_connect.py#L403-L409

But since we are using prometheus-api-client-python, it is not possible to do this.

Screenshots

image

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

KRR scans not working with Azure managed prometheus

Describe the bug
Either using the UI portal or trying the direct calls from the cli with krr.py

To Reproduce
Steps to reproduce the behavior:

  1. Setup robusta with AZ managed prometheus
  2. Use ClientID + Secret
  3. Metrics and alerts are in place but KRR scan is not working

Expected behavior
Have KRR_scans up and running via cli + UI

Be able to explicitly ask for recommendations based on max/avg mem and/or cpu

Is your feature request related to a problem? Please describe.
As a user, I'd love to be able to explicitly select krr recommendations based on:

  • average cpu
  • max cpu
  • average mem
  • max mem

Stretch-goal: Be able to mix and match them

Describe the solution you'd like

Some additional ability to use krr in the following way:

krr --avg-mem --max-cpu

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Writing my own custom strategy (this will take me a bit...would prefer to avoid) or hacking on a copy of the code.

Limit concurrency of asyncio for gather_objects_recommendations

Describe the bug

When we have a large amount of container/pods in the cluster, e.g. >1000.

async def _gather_objects_recommendations(
self, objects: list[K8sObjectData]
) -> list[tuple[ResourceAllocations, MetricsData]]:
recommendations: list[tuple[RunResult, MetricsData]] = await asyncio.gather(
*[self._calculate_object_recommendations(object) for object in objects]
)

will start the >1000 coroutines to concurrently query Metrics server which cause resource exhaustion, e.g. connection pool in the VictoriaMetrics, local memory larger than 20GB, etc.

To Reproduce
Steps to reproduce the behavior:

  1. Go to '...'
  2. Click on '....'
  3. Scroll down to '....'
  4. See error

Expected behavior

According to the SO,

async def gather_with_concurrency(n: int, *coros):
    semaphore = asyncio.Semaphore(n)

    async def sem_coro(coro):
        async with semaphore:
            return await coro
    return await asyncio.gather(*(sem_coro(c) for c in coros))

can be used to limit concurrency level.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
Add any other context about the problem here.

AttributeError: NoneType object has no attribute items

Greetings,

I just came across KRR and it looks impressive! Unfortunately, I am unable to run it on our self-hosted Kubernetes cluster, it fails with AttributeError: 'NoneType' object has no attribute 'items' error as seen below. Any insight on fixing this will be highly appreciated. Thanks.

python krr.py simple -v -n ops


 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) 1.1.1
Using strategy: Simple
Using formatter: table

[DEBUG] Found 1 clusters: kubernetes-admin@kubernetes           (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:206)
[DEBUG] Current cluster: kubernetes-admin@kubernetes            (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:207)
[DEBUG] Configured clusters: []         (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:209)
[INFO] Using clusters: ['kubernetes-admin@kubernetes']
[INFO] Listing scannable objects in kubernetes-admin@kubernetes
[DEBUG] Namespaces: ['ops']             (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:46)
[DEBUG] Listing deployments in kubernetes-admin@kubernetes              (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:117)
[DEBUG] Listing statefulsets in kubernetes-admin@kubernetes             (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:130)
[DEBUG] Listing daemonsets in kubernetes-admin@kubernetes               (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:143)
[DEBUG] Listing jobs in kubernetes-admin@kubernetes             (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:156)
[DEBUG] Found 12 daemonsets in kubernetes-admin@kubernetes              (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:145)
[DEBUG] Found 104 statefulsets in kubernetes-admin@kubernetes           (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:132)
[DEBUG] Found 407 deployments in kubernetes-admin@kubernetes            (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:119)
[DEBUG] Found 756 jobs in kubernetes-admin@kubernetes           (/home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:158)
[ERROR] Error trying to list pods in cluster kubernetes-admin@kubernetes: 'NoneType' object has no attribute 'items'
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:49 in list_scannable_objects           │
│                                                                                                  │
│    46 │   │   self.debug(f"Namespaces: {self.config.namespaces}")                                │
│    47 │   │                                                                                      │
│    48 │   │   try:                                                                               │
│ ❱  49 │   │   │   objects_tuple = await asyncio.gather(                                          │
│    50 │   │   │   │   self._list_deployments(),                                                  │
│    51 │   │   │   │   self._list_all_statefulsets(),                                             │
│    52 │   │   │   │   self._list_all_daemon_set(),                                               │
│                                                                                                  │
│ /home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:121 in _list_deployments               │
│                                                                                                  │
│   118 │   │   ret: V1DeploymentList = await asyncio.to_thread(self.apps.list_deployment_for_al   │
│   119 │   │   self.debug(f"Found {len(ret.items)} deployments in {self.cluster}")                │
│   120 │   │                                                                                      │
│ ❱ 121 │   │   return await asyncio.gather(                                                       │
│   122 │   │   │   *[                                                                             │
│   123 │   │   │   │   self.__build_obj(item, container)                                          │
│   124 │   │   │   │   for item in ret.items                                                      │
│                                                                                                  │
│ /home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:113 in __build_obj                     │
│                                                                                                  │
│   110 │   │   │   kind=item.__class__.__name__[2:],                                              │
│   111 │   │   │   container=container.name,                                                      │
│   112 │   │   │   allocations=ResourceAllocations.from_container(container),                     │
│ ❱ 113 │   │   │   pods=await self.__list_pods(item),                                             │
│   114 │   │   )                                                                                  │
│   115 │                                                                                          │
│   116 │   async def _list_deployments(self) -> list[K8sObjectData]:                              │
│                                                                                                  │
│ /home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:94 in __list_pods                      │
│                                                                                                  │
│    91 │   │   return ",".join(label_filters)                                                     │
│    92 │                                                                                          │
│    93 │   async def __list_pods(self, resource: Union[V1Deployment, V1DaemonSet, V1StatefulSet   │
│ ❱  94 │   │   selector = self._build_selector_query(resource.spec.selector)                      │
│    95 │   │   if selector is None:                                                               │
│    96 │   │   │   return []                                                                      │
│    97                                                                                            │
│                                                                                                  │
│ /home/pkr/krr/robusta_krr/core/integrations/kubernetes.py:84 in _build_selector_query            │
│                                                                                                  │
│    81 │                                                                                          │
│    82 │   @staticmethod                                                                          │
│    83 │   def _build_selector_query(selector: V1LabelSelector) -> Union[str, None]:              │
│ ❱  84 │   │   label_filters = [f"{label[0]}={label[1]}" for label in selector.match_labels.ite   │
│    85 │   │                                                                                      │
│    86 │   │   if selector.match_expressions is not None:                                         │
│    87 │   │   │   label_filters.extend(                                                          │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
AttributeError: 'NoneType' object has no attribute 'items'
[WARNING] Current filters resulted in no objects available to scan.
[WARNING] Try to change the filters or check if there is anything available.


┏━━━━━━━━┳━━━━━━━━━┳━━━━━━━━━━━┳━━━━━━┳━━━━━━┳━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━━┳━━━━━━━━━━━━━━┳━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━┓
┃ Number ┃ Cluster ┃ Namespace ┃ Name ┃ Pods ┃ Old Pods ┃ Type ┃ Container ┃ CPU Requests ┃ CPU Limits ┃ Memory Requests ┃ Memory Limits ┃
┡━━━━━━━━╇━━━━━━━━━╇━━━━━━━━━━━╇━━━━━━╇━━━━━━╇━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━━╇━━━━━━━━━━━━━━╇━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━┩
└────────┴─────────┴───────────┴──────┴──────┴──────────┴──────┴───────────┴──────────────┴────────────┴─────────────────┴───────────────┘

Kubernetes Info:

Self hosted cluster. Kubernetes version v1.25.7
Host OS: Ubuntu 22.04.1

Unit alignment

The unit measured is not aligned, which makes the compartment difficult. For example in memory usage:
k to M
2097152k -> 498M
Mi to M
128Mi -> 995M

Cannot get recommendations by using krr 1.3.2

Describe the bug

I updated the krr tool from version 1.2.1 to 1.3.2 today. Unfortunately, I cannot find a way to get recommendations by using the simple strategy. The same commands work fine for 1.2.1.

To Reproduce

Execute the following command:

krr.exe simple --kubeconfig <path-to-kubeconfig> --context <context-name> --namespace <namespace-name>

1.2.1

 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) v1.2.1
Using strategy: Simple
Using formatter: table

[WARNING] Could not load context from kubeconfig.
[WARNING] Falling back to clusters from CLI: ['context-name']
[INFO] Using clusters: ['context-name']
[INFO] Listing scannable objects in context-name
[INFO] Found 7 objects across 1 namespaces in context-name
on 0: [INFO] Connecting to Prometheus for context-name cluster
on 0: [INFO] Using Prometheus at https://some-domain-here/api/v1/namespaces/some-namespace-here/services/prometheus-server-service:9091/proxy for cluster context-name
on 0: [INFO] Prometheus found
on 0: [INFO] Prometheus connected successfully for context-name cluster
Calculating Recommendation |████████████████████████████████████████| 7/7 [100%] in 7.1s (0.60/s)



Simple Strategy

CPU request: 99.0% percentile, limit: unset
Memory request: max + 5.0%, limit: max + 5.0%
Learn more: https://github.com/robusta-dev/krr#algorithm
<the table is here>

1.3.2

 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) v1.3.2
Using strategy: Simple
Using formatter: table

[WARNING] Could not load context from kubeconfig.
[WARNING] Falling back to clusters from CLI: ['context-name']
[INFO] Using clusters: ['context-name']
[INFO] Listing scannable objects in context-name
[INFO] Found 7 objects across 1 namespaces in context-name
on 0: [INFO] Connecting to Prometheus for context-name cluster
on 0: [INFO] Using Prometheus at https://some-domain-here/api/v1/namespaces/some-namespace-here/services/prometheus-server-service:9091/proxy for cluster context-name
on 0: [INFO] Prometheus found
Calculating Recommendation |⚠︎                                       | (!) 0/7 [0%] in 3.9s (0.00/s)
[ERROR] No label specified, Rerun krr with the flag `-l <cluster>` where <cluster> is one of [<a very long list of items>]

The a very long list of items contains all K8s namespaces. As requested I added the -l parameter with the namespace-name value (that value was in the list):

krr.exe simple --kubeconfig <path-to-kubeconfig> --context <context-name> --namespace <namespace-name> -l <namespace-name>
 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) v1.3.2
Using strategy: Simple
Using formatter: table

[WARNING] Could not load context from kubeconfig.
[WARNING] Falling back to clusters from CLI: ['context-name']
[INFO] Using clusters: ['context-name']
[INFO] Listing scannable objects in context-name
[INFO] Found 7 objects across 1 namespaces in context-name
on 0: [INFO] Connecting to Prometheus for context-name cluster
on 0: [INFO] Using Prometheus at https://some-domain-here/api/v1/namespaces/some-namespace-here/services/prometheus-server-service:9091/proxy for cluster context-name
on 0: [INFO] Prometheus found
on 0: [WARNING] Prometheus returned no MemoryMetricLoader metrics for StatefulSet namespace-name/statefulset-name-1/container-name-1
on 0: [WARNING] Prometheus returned no CPUMetricLoader metrics for StatefulSet namespace-name/statefulset-name-1/container-name-1
on 1: [WARNING] Prometheus returned no CPUMetricLoader metrics for StatefulSet namespace-name/statefulset-name-2/container-name-2
on 1: [WARNING] Prometheus returned no MemoryMetricLoader metrics for StatefulSet namespace-name/statefulset-name-2/container-name-2
[...]

Simple Strategy

CPU request: 99.0% percentile, limit: unset
Memory request: max + 5.0%, limit: max + 5.0%

This strategy does not work with objects with HPA defined (Horizontal Pod Autoscaler).
If HPA is defined for CPU or Memory, the strategy will return "?" for that resource.

Learn more: https://github.com/robusta-dev/krr#algorithm

┏━━━━━━━━┳━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Number ┃ Namespace      ┃ Name                           ┃ Pods ┃ Old Pods ┃ Type        ┃ Container              ┃ CPU Diff ┃ CPU Requests         ┃ CPU Limits           ┃ Memory Diff ┃ Memory Requests       ┃ Memory Limits         ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━┩
│     1. │ namespace-name │ deployment-name-1              │ 2    │ 0        │ Deployment  │ container-name-1       │          │ 500m -> ? (No data)  │ 1 -> ? (No data)     │             │ 1600Mi -> ? (No data) │ 1920Mi -> ? (No data) │
├────────┼────────────────┼────────────────────────────────┼──────┼──────────┼─────────────┼────────────────────────┼──────────┼──────────────────────┼──────────────────────┼─────────────┼───────────────────────┼───────────────────────┤
[...]

<all recommendations are set to no data>

My kubeconfig:

apiVersion: v1
kind: Config
clusters:
  - name: context-name
    cluster:
        server: 'https://some-domain-here'
        certificate-authority-data: >-
            some-data-here
contexts:
  - name: context-name
    context:
        cluster: context-name
        user: context-name-token
        namespace: default
current-context: context-name
users:
  - name: context-name-token
    user:
        exec:
            apiVersion: client.authentication.k8s.io/v1beta1
            command: kubectl
            args:
              - oidc-login
              - get-token
              - some-params-here

Expected behavior

Recommendations should be provided with valid data.

Desktop:

  • OS: Microsoft Windows 11 Enterprise, 10.0.22621 N/A Build 22621
  • PowerShell 7.3.6
  • Browser: Brave 1.52.130 with Chromium 114.0.5735.198 (Official Build)

Add option to select object for which recommendations will be generated

Is your feature request related to a problem? Please describe.

I work on a project where we have many different components in a one namespace. When I am interested with recommendation only for one of them, I can limit the tool to be executed only in a one namespace, but I cannot limit only to specific objects. The problem is that the same component is deployed in many clusters and currently the configuration is fixed - the same value for that specific component in all clusters. To get the recommendations, I execute the krr tool on the same namespace in all clusters. That process is quite long because the tool prepares recommendations for 36 pods in every cluster. I am interested with the results for only one of those pods in every cluster.

Describe the solution you'd like

The simple strategy provides the --namespace parameter to select which namespace should be checked. It would be nice to have an additional parameter called --name. Then I can specify:

krr simply --namespace namespace --name component1

and only 1 component (name of the deployment/stateful) would be checked with 1-X pods.

Describe alternatives you've considered

I saw there was a new parameter added which has been not released yet: --selector. If the K8s objects are labeled properly, it could be used to find such items too.

Support metrics-based workload discovery

Is your feature request related to a problem? Please describe.

In some cases, we want to run the KRR program locally. But for the security consideration, the API server of the Kubernetes cluster cannot be accessed outside of the cluster.

So we can use the Prometheus-based workload discovery if kube-state-metrics is installed.

Describe the solution you'd like

We can do the workload based discovery with the following steps,

  1. List Deployments together with their ReplicaSets,
replicasets = await self.metrics_loader.loader.query("count by (namespace, owner_name, replicaset) (kube_replicaset_owner{"
                                               f'namespace=~"{ns}", '
                                               'owner_kind="Deployment"})')
  1. List Pods from a group of ReplicaSets
# owner_name is ReplicaSet names
pods = await self.metrics_loader.loader.query("count by (owner_name, replicaset, pod) (kube_pod_owner{"
                                               f'namespace="{namespace}", '
                                               f'owner_name=~"{owner_name}", '
                                               'owner_kind="ReplicaSet"})')
  1. List containers from Pods got from step (2)
containers = await self.metrics_loader.loader.query("count by (container) (kube_pod_container_info{"
                                               f'namespace="{namespace}", '
                                               f'pod=~"{pod_selector}"'
                                               "})")
  1. Build K8sObjectData for containers got from step (3)
async def __build_from_owner(self, namespace: str, app_name: str, containers: List[str], pod_names: List[str]) -> List[K8sObjectData]:
        return [
            K8sObjectData(
                cluster=None,
                namespace=namespace,
                name=app_name,
                kind="Deployment",
                container=container_name,
                allocations=await self.__parse_allocation(namespace, "|".join(pod_names), container_name), # find 
                pods=[PodData(name=pod_name, deleted=False) for pod_name in pod_names], # list pods
            )
            for container_name in containers
        ]

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

krr doesn't find pods in target namespace - 404

Describe the bug

$ python3.11 krr.py simple -n $NAMESPACE
...
[ERROR] Error trying to list pods in $NAMESPACE (404)
...
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"the server could not find the requested resource","reason":"NotFound","details":{},"code":404}
...

To Reproduce
see above

Expected behavior
Finds pods deployed in provided namespace

Desktop (please complete the following information):

  • OS: [e.g. iOS]
$ cat /etc/os-release 
NAME="Ubuntu"
VERSION="20.04.6 LTS (Focal Fossa)"
  • Version [e.g. 22]
$ python3.11 krr.py version
1.3.0-dev

$ kubectl version --short
Client Version: v1.26.0
Kustomize Version: v4.5.7
Server Version: v1.21.8

Additional context
kubectl client works as expected.

PrometheusApiClientException: HTTP Status Code 414 Request-URI Too Large

Describe the bug
"krr simple -n kube-system -p https://prometheus.example.com" gives "Request-URI Too Large" exception

To Reproduce
not sure if this is reproducible in every environment, i am trying the command on a cluster having 165 nodes, not sure if that is a big number for krr

Expected behavior
report should appear

Screenshots
pasting the error with masked url and cluster name

$ krr simple -n kube-system -p https://prometheus.example.com


 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) v1.2.1
Using strategy: Simple
Using formatter: table

[INFO] Using clusters: ['k8-cluster']
[INFO] Listing scannable objects in k8-cluster
[INFO] Found 30 objects across 1 namespaces in k8-cluster
on 0: [INFO] Connecting to Prometheus for k8-cluster cluster
on 0: [INFO] Using Prometheus at https://prometheus.example.com for cluster k8-cluster
on 0: [INFO] Prometheus found
on 0: [INFO] Prometheus connected successfully for k8-cluster cluster
Calculating Recommendation |⚠︎                                       | (!) 0/30 [0%] in 1:55.2 (0.00/s)
╭─────────────────────────────── Traceback (most recent call last) ────────────────────────────────╮
│ /private/tmp/robusta_krr/core/runner.py:174 in run                                               │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/private/tmp/robusta_krr/core/runner.py'                   │
│                                                                                                  │
│ /private/tmp/robusta_krr/core/runner.py:153 in _collect_result                                   │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/private/tmp/robusta_krr/core/runner.py'                   │
│                                                                                                  │
│ /private/tmp/robusta_krr/core/runner.py:125 in _gather_objects_recommendations                   │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/private/tmp/robusta_krr/core/runner.py'                   │
│                                                                                                  │
│ /private/tmp/robusta_krr/core/runner.py:101 in _calculate_object_recommendations                 │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/private/tmp/robusta_krr/core/runner.py'                   │
│                                                                                                  │
│ /private/tmp/robusta_krr/core/integrations/prometheus/loader.py:90 in gather_data                │
│                                                                                                  │
│ [Errno 2] No such file or directory:                                                             │
│ '/private/tmp/robusta_krr/core/integrations/prometheus/loader.py'                                │
│                                                                                                  │
│                                     ... 2 frames hidden ...                                      │
│                                                                                                  │
│ /private/tmp/robusta_krr/core/integrations/prometheus/metrics/base_filtered_metric.py:61 in      │
│ query_prometheus                                                                                 │
│                                                                                                  │
│ [Errno 2] No such file or directory:                                                             │
│ '/private/tmp/robusta_krr/core/integrations/prometheus/metrics/base_filtered_metric.py'          │
│                                                                                                  │
│ /private/tmp/robusta_krr/core/integrations/prometheus/metrics/base_metric.py:71 in               │
│ query_prometheus                                                                                 │
│                                                                                                  │
│ [Errno 2] No such file or directory:                                                             │
│ '/private/tmp/robusta_krr/core/integrations/prometheus/metrics/base_metric.py'                   │
│                                                                                                  │
│ /private/tmp/asyncio/threads.py:25 in to_thread                                                  │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/private/tmp/asyncio/threads.py'                           │
│                                                                                                  │
│ /private/tmp/concurrent/futures/thread.py:58 in run                                              │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/private/tmp/concurrent/futures/thread.py'                 │
│                                                                                                  │
│ /private/tmp/prometheus_api_client/prometheus_connect.py:408 in custom_query_range               │
│                                                                                                  │
│ [Errno 2] No such file or directory: '/private/tmp/prometheus_api_client/prometheus_connect.py'  │
╰──────────────────────────────────────────────────────────────────────────────────────────────────╯
PrometheusApiClientException: HTTP Status Code 414 (b'<html>\r\n<head><title>414 Request-URI Too Large</title></head>\r\n<body>\r\n<center><h1>414 Request-URI Too
Large</h1></center>\r\n<hr><center>nginx</center>\r\n</body>\r\n</html>\r\n')

Desktop (please complete the following information):

  • OS: [e.g. iOS]
  • Browser [e.g. chrome, safari]
  • Version [e.g. 22]

Smartphone (please complete the following information):

  • Device: [e.g. iPhone6]
  • OS: [e.g. iOS8.1]
  • Browser [e.g. stock browser, safari]
  • Version [e.g. 22]

Additional context
on same cluster, it worked on a namespace having 3 pods, looks like something to do with large number of pods

krr version information hasn't been updated since v1.1.1

Describe the bug
The krr version information stored in the follow locations has not been bumped since the release of v1.1.1:

This results in wrong package metadata and incorrect output of the version command:

~/git/krr ((HEAD detached at v1.2.1))$ python krr.py version
1.1.1

To Reproduce
Steps to reproduce the behavior:

  1. Clone the git repo
  2. Checkout a specific version tag > 1.1.1 via git checkout tags/v1.2.1
  3. Run the version command: python krr.py version

Expected behavior
The version information and package metadata should be updated on every release.

Connecting to prometheus Behaviour

In the documentation, it should be made clear that the Prometheus needs to have one of the above labels

"app=kube-prometheus-stack-prometheus",
"app=prometheus,component=server",
"app=prometheus-server",
"app=prometheus-operator-prometheus",
"app=prometheus-msteams",
"app=rancher-monitoring-prometheus",
"app=prometheus-prometheus",

In addition, IMO the Ingress host should likely be the default URL.
As a fallback, it can try to look for the Service LB or access the Prometheus through the cluster API

Another small point as developers may not have access to the prometheus server namespace, it might be worth saying the recommended way to run this is using the -p flag

Excel Export

As much as we all hate xlsx documents,

I was asked to put all krr into a spreadsheet and share via email.

I as a platform engineer would like to have a custom output format to xlsx or csv.

Can't install v1.2.1 through Homebrew on linux

Describe the bug
Cannot install krr with homebrew on linux since artifacts have been renamed

To Reproduce
Steps to reproduce the behavior:

  1. brew upgrade krr
    ==> Upgrading 1 outdated package:
    robusta-dev/krr/krr 1.0.0 -> 1.2.1
    ==> Fetching robusta-dev/krr/krr
    Error: krr: Failed to download resource "krr"
    Failure while executing; /usr/bin/env /home/linuxbrew/.linuxbrew/Homebrew/Library/Homebrew/shims/shared/curl --disable --cookie /dev/null --globoff --show-error --user-agent Linuxbrew/4.0.21\ \(Linux\;\ x86_64\ Ubuntu\ 20.04.6\ LTS\)\ curl/7.68.0 --header Accept-Language:\ en --retry 3 --fail --location --silent --head --request GET https://github.com/robusta-dev/krr/releases/download/v1.2.1/krr-linux-latest-v1.2.1.zip exited with 22. Here's the output:
    curl: (22) The requested URL returned error: 404

The issue is with the artifact name on linux which is krr-linux-latest-v1.2.1.zip on brew but krr-ubuntu-latest-v1.2.1.zip on release page

Not sure on which side you want it to be fixed (homebrew vs artifact name)

Expected behavior
Krr should install properly

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • Windows 10 with WSL2

Add Grafana Mimir integration

Is your feature request related to a problem? Please describe.
Grafana Mimir (even without authentication, so this is partially related to #18) requires specifying the tenant via HTTP header X-Scope-OrgID.

Describe the solution you'd like
Option to either pass arbitrary headers or specific flag for Mimir.

Describe alternatives you've considered
A clear and concise description of any alternative solutions or features you've considered.

Additional context
Add any other context or screenshots about the feature request here.

Prometheus high memory utilization because of krr queries

Describe the bug

I first noticed a problem when I clicked Rescan on the Efficiency panel of Robusta.dev that connects to Prometheus and gathers metrics from my cluster where I have installed the Robusta Helm chart.

My prometheus Pod that normally uses less than 1.5 GiB of memory suddenly needs a lot more than 2.5 GiB. Because it’s running on a node with only 4 GiB ram (and is limited to 3GiB in the Pod spec), the pod is OOM-killed. You can see that behavior in the graph below. Robusta UI also shows the OOM-killed pod.

I believe that feature uses krr in the background, so I also tried running it directly on the CLI pointing at the same Prometheus endpoint, which produced the same result.

Anything I could do to improve that (besides the obvious increase in memory limits)?

To Reproduce
Steps to reproduce the behavior:

  1. Run krr pointing to a Prometheus pod with a 3 GiB memory limit
  2. Prometheus should have 250k+ time series and 10k+ label pairs
  3. Do not limit the namespaces scanned by krr

Expected behavior
Not cause Prometheus to crash

Screenshots
image (6)

Additional context

When scanning a single namespace Prometheus doesn't crash but you can still notice a big hike in memory utilization
image (7)

PrometheusNotFound: when running the Rescan from the UI.

Describe the bug

In the UI under Efficiency when you select cluster to rescan and then click Rescan you get an error in the logs that says PrometheusNotFound: Prometheus url could not be found while scanning in default cluster

I am using an external prometheus

To Reproduce
Steps to reproduce the behavior:

  1. Go to 'Efficiency`
  2. Select a cluster from the Cluster to rescan dropdown.
  3. Click on Rescan'
  4. Wait for message to popup saying the job failed.
  5. See error in logs

Expected behavior
I would expect this to rescan the cluster

Desktop (please complete the following information):

  • OS: MacOS
  • Browser Chrome
  • Version 113.0.5672.92

Brew package

Right now installing is a little convoluted if you just want to take the tool for a spin, we are all familiar with the pain of python envs.

It would be great if krr had a brew package.

Consider enriching `Reason: Not Found` error

krr simple simply returns the following error:

[ERROR] Error trying to list pods in cluster foo@bar: (404)
Reason: Not Found
HTTP response headers: HTTPHeaderDict({'Audit-Id': 'XXX', 'Cache-Control': 'no-cache, private', 'Content-Type': 'application/json',
'X-Kubernetes-Pf-Flowschema-Uid': 'XXX', 'X-Kubernetes-Pf-Prioritylevel-Uid': 'XXX', 'Date': 'Mon, 17 Jul 2023 09:11:41 GMT',
'Content-Length': '174'})
HTTP response body: {"kind":"Status","apiVersion":"v1","metadata":{},"status":"Failure","message":"the server could not find the requested
resource","reason":"NotFound","details":{},"code":404}

I'm not so sure if the error above is quite descriptive to end user. It'd better to do enriching and rewording, if possible.

krr version
v1.3.2
Kubernetes
v1.20.7

Pods are not found

Describe the bug
We do have Pods which are created by another Pod (GitLab Runner Kubernetes Executer).
These Pods don't have a Deployment, but even while they are running, they are not found in krr.

How can these Pods also be checked.

Screenshots
If applicable, add screenshots to help explain your problem.
Screenshot 2023-08-31 at 21 18 40

Does not work with a proxy

I need a proxy to get to our kubernetes environments.
The proxy is specified in kubeconfig but it seems krr aren't using it.
I also tried to set an http{s}_proxy env variable but that didn't work either.

Using krr v1.0.0.

Krr tries to autodiscover prometheus (which takes time) even though -p flag is given

Krr tries to autodiscover prometheus/victoria/Thanos even though the -p flag is passed to the command.

If it's relevant (it might be because of the [DEBUG] Prometheus not found log line), I'm using a port-forwarded GKE managed Prometheus here.

Note that despite the log messages, it is actually working and spitting out recommendations.

krr simple -v -p http://127.0.0.1:9090 


 _____       _               _          _  _______  _____
|  __ \     | |             | |        | |/ /  __ \|  __ \
| |__) |___ | |__  _   _ ___| |_ __ _  | ' /| |__) | |__) |
|  _  // _ \| '_ \| | | / __| __/ _` | |  < |  _  /|  _  /
| | \ \ (_) | |_) | |_| \__ \ || (_| | | . \| | \ \| | \ \
|_|  \_\___/|_.__/ \__,_|___/\__\__,_| |_|\_\_|  \_\_|  \_\



Running Robusta's KRR (Kubernetes Resource Recommender) v1.4.1
Using strategy: Simple
Using formatter: table

[DEBUG] Found 2 clusters: production, staging
(/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:370)
[DEBUG] Current cluster: staging                (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:371)
[DEBUG] Configured clusters: []         (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:373)
[INFO] Using clusters: ['staging']
[INFO] Listing scannable objects in staging
[DEBUG] Namespaces: *           (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:64)
[DEBUG] Listing deployments in staging          (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:143)
[DEBUG] Listing ArgoCD rollouts in staging              (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:163)
[DEBUG] Listing statefulsets in staging         (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:190)
[DEBUG] Listing daemonsets in staging           (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:210)
[DEBUG] Listing jobs in staging         (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:230)
[DEBUG] Found 1 rollouts in staging             (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:179)
[DEBUG] Found 3 statefulsets in staging         (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:199)
[DEBUG] Found 16 jobs in staging                (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:239)
[DEBUG] Found 38 daemonsets in staging          (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:219)
[DEBUG] Found 72 deployments in staging         (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/kubernetes.pyc:152)
[INFO] Found 175 objects across 19 namespaces in staging
on 0: [INFO] Connecting to Prometheus for staging cluster
on 0: [INFO] Using Prometheus at http://127.0.0.1:9090 for cluster staging
on 0: [DEBUG] Prometheus not found            (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/prometheus/loader.pyc:72)
on 0: [INFO] Connecting to Victoria for staging cluster
on 0: [INFO] Using Victoria at http://127.0.0.1:9090 for cluster staging
on 0: [DEBUG] Victoria Metrics not found              (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/prometheus/loader.pyc:72)
on 0: [INFO] Connecting to Thanos for staging cluster
on 0: [INFO] Using Thanos at http://127.0.0.1:9090 for cluster staging
on 0: [DEBUG] Thanos not found                (/opt/homebrew/Cellar/krr/1.4.1/libexec/robusta_krr/core/integrations/prometheus/loader.pyc:72)
on 0: [ERROR] No Prometheus or metrics service found
Calculating Recommendation |⚠︎                                       | (!) 0/175 [0%] in 6:00.3 (0.00/s)

Ability to set custom Prometheus data source for `remote_write` users

This feature is desirable for users who use the Prometheus's remote_write feature. We push the metrics to remote Victoria Metric agents so all Prometheus (kube-prometheus) instances in the clusters has 4 hours of retention. We also append a _cluster=my-cluster-name label for each metric to identify where the metric is coming from.

$ kubectl get secrets prometheus-prometheus-operator-kube-p-prometheus -o json | jq '.data."prometheus.yaml.gz"' -r | base64 -d | gunzip | yq e '.remote_write' -

- url: http://my-remote-vmagent/api/v1/write

I'd be nice to set manually Prometheus instance address with some custom label selectors (to ensure the corresponding cluster we are querying is correct). Any thoughts?

Restrict workload selection

Is your feature request related to a problem? Please describe.
One might have a huge cluster and want to focus on optimizing just a particular workload.
Prometheus queries while analysing all worklaods in a cluster/namespace might take too long otherwise.

Describe the solution you'd like
Introduce a new command line argument as kubectl has:

    -l, --selector='':
	Selector (label query) to filter on, supports '=', '==', and '!='.(e.g. -l key1=value1,key2=value2). Matching
	objects must satisfy all of the specified label constraints.

to limit the workload selection. E. g:

krr simple --context my-cluster --namespace kube-system --selector app-instance=metrics-server
# or
krr simple --context my-cluster --namespace kube-system --selector owner=devops

Describe alternatives you've considered

krr simple --context my-cluster --namespace kube-system --workload Deployment/metrics-server

Additional context

Thanks for this wonderful tool. Happy to learn if there is alternative solution for me.

Error trying to list pods

Describe the bug
When I run the simplest possible command, no pod is detected.

Full output with -v arg: https://pastebin.com/XZTGz2HY

Screenshots
image

Desktop (please complete the following information):

  • OS: Ubuntu WSL

Additional context

  • K8s version: 1.26.4

Applying recommendations immediately leads to OOMKilled on startup

Describe the bug
I am new to right-sizing k8s clusters and thought I'd give krr a try.

I've used all the default settings (history and buffer) and krr has suggested significant changes across many of my pods.

Upon applying them across two of my example pods, I immediately get OOMKilled errors.

To Reproduce
N/A

Expected behavior
A pod's expected memory footprint to be taken into consideration and not have its memory limit set so low that it can't start.

Screenshots
CleanShot 2023-07-18 at 13 46 26@2x

CleanShot 2023-07-18 at 13 46 40@2x CleanShot 2023-07-18 at 13 47 45@2x

As can be seen in the above screenshot, HomeAssistant exceeds 1GB of memory use on a number of occasions in the past week, but krr suggests 516Mi as its memory limit.

Any tips on how to make this more useful?

Can't krr use the nodeport port?

My kubernetes cluster is on another host, and it seems impossible to use nodeport to access the Prometheus server address. What should I do?

image

krr --install-completion crash

Describe the bug
krr --install-completion crashes when krr was install via homebrew

To Reproduce
Steps to reproduce the behavior:

$ brew tap robusta-dev/homebrew-krr
$ brew install krr
$ krr --install-completion

Expected behavior
Shell completion installed successfuly

Screenshots
Screenshot 2023-05-31 at 12 24 31
Screenshot 2023-05-31 at 12 24 16

Desktop (please complete the following information):

  • OS: macOS on M2 chip
  • Version: Ventura

Log to stderr

That's because -v flag omits some logs to stdout, which appends the verbose-related logs in the generated-JSON file. So the following steps does not work:

$ python krr.py simple -v -f json > result.json
$ cat result.json | jq
parse error: Invalid numeric literal at line 3, column 7

In the expected behaviour, krr should log to stderr and JSON output to stdout. Wdyt?

Brainstorming: krr-operator

I was thinking about to create a Kubernetes operator that reads the output of krr and apply the recommended values to corresponding cluster. However, I don't feel comfortable while writing Python and was thinking to create an operator in Go that calls krr under the hood to grab recommended values and apply using the client-go API. So I thought we could officially provide an operator.

Some design ideas:

  • Introduce the operator in the separate repo vs use this one as a monorepo
  • Create a container image for krr-operator
  • It's long running standalone single Pod that check and apply recommended values in scheduled manner
    • leader-election may needed for H/A (may not needed in the first phase)
    • some Prometheus metrics for monitoring stuff (may not needed in the first phase)
  • A reconciler: subscribe to informers and apply recommended values immediately (this is required because once we edit the CPU/MEM values, a new deployment may override it, so might think of re-applying. may not needed in the first phase)

It'd be perfect opportunity to introduce an operator for this brilliant tool. Wdyt?

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.