cloudworkz / kube-eagle Goto Github PK
View Code? Open in Web Editor NEWA prometheus exporter created to provide a better overview of your resource allocation and utilization in a Kubernetes cluster.
License: MIT License
A prometheus exporter created to provide a better overview of your resource allocation and utilization in a Kubernetes cluster.
License: MIT License
Hey,
I'm running a cluster on EKS (so no nodepools) and have metrics server and kube-eagle installed which are exposing the metrics correctly.
Now on importing the dashboard in grafana (we're at 6.1.2) I get all the dashboards full of random walk data:
Drilling into the panel I only see it filled with the random walk queries:
Looking into the json model again for exporting the targets are set correctly:
"targets": [
{
"expr": "sum(eagle_node_resource_usage_memory_bytes{node=~\"$node_pool.*\", node=~\"$node.*\"}) / sum(eagle_node_resource_allocatable_memory_bytes{node=~\"$node_pool.*\", node=~\"$node.*\"})",
"format": "time_series",
"instant": true,
"intervalFactor": 1,
"refId": "A"
}
],
Do you have any clue what's going on? I also tried the dance with the node pool variable, but that has no effect on the random data.
EDIT: okay it seems that in each of the panel the datasource is missing:
"datasource": "${DS_PROMETHEUS}",
I manually added it there for each panel and it imports just fine :)
My prometheus reports metrics like this for all the nodes equals to 0.
Any reasons?
eagle_node_resource_usage_cpu_cores{endpoint="http",instance="100.126.64.40:8080",job="kube-eagle",namespace="monitoring",node="ip-172-20-95-140.ec2.internal",pod="kube-eagle-d4c4bbf9f-4vgtx",service="kube-eagle"}
can you tell me if kube-eagle is usable with aws eks ?
regards
Currently there is no way to shard metrics. It could be possible to implement something similar to this PR. It would be enough to have static sharding for the starters.
What do you think? Could I raise a PR for this feature?
I noticed there are lots of restarts.
5m14s Warning Unhealthy Pod Liveness probe failed: Get http://10.244.2.95:8080/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
10m Warning Unhealthy Pod Readiness probe failed: Get http://10.244.2.95:8080/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
22m Warning Unhealthy Pod Readiness probe failed: Get http://10.244.2.95:8080/health: EOF
22m Normal Killing Pod Killing container with id docker://kube-eagle:Container failed liveness probe.. Container will be killed and recreated.
Do I need to change smth for health check params to make it work properly?
I have just install kube-eagle and i like it a lot, however when I spaned a new node pool ( with autoscaling or manualy) kube eagle exporter wont install itself on it. In comparason the kube-exporter that deploy prometheus-operator automaticely do it on new node pool.
Is it a bug in my deployment configuration ? Or is it the normal case-utilisation and if it is how can I set up kube eagle so that it scrape my new node pool ?
Hi!!
Amazing stuff you got there.
I have a question/problem.
When there's a pod dying and being replaced, there's still data in eagle and prometheus is still pushing it to my grafana so it's misleading how much resources is being used.
Is there a setting somewhere in eagle or if it's in prometheus where data is stored and for how long?
Thanks
Math
After deploying Kube-Eagle using the helm chart, I got the following logs:
{"level":"info","msg":"Starting kube eagle v1.1.0","time":"2019-03-27T20:08:20Z"}
{"level":"info","msg":"Creating InCluster config to communicate with Kubernetes master","time":"2019-03-27T20:08:20Z"}
{"level":"info","msg":"Listening on 0.0.0.0:8080","time":"2019-03-27T20:08:20Z"}
{"level":"warning","msg":"Failed to get podMetricses from Kubernetesthe server could not find the requested resource (get pods.metrics.k8s.io)","time":"2019-03-27T20:09:19Z"}
{"level":"error","msg":"Collector 'container_resources' failed after 0.051743s: the server could not find the requested resource (get pods.metrics.k8s.io)","time":"2019-03-27T20:09:19Z"}
{"level":"warning","msg":"Failed to get podList from Kubernetesthe server could not find the requested resource (get nodes.metrics.k8s.io)","time":"2019-03-27T20:09:19Z"}
[...]
Probably something simple, didn't look too much into it. Maybe a permission issue ? (even though the clusterRoles and everything for Kube-Eagle seem correctly configured)
We recommend using our provided helm chart to deploy kube eagle in your cluster:
Why add all that complexity when a clean k8s yaml file will work [better]? Sed is a superior "templating engine" compared to Helm. Can I get the real yaml? I tried downloading the chart and using helm template
but it just gives a gzip error.
In column POD i see only pod for kube-eagle. How i can determine what this is container if i have multiple pods with same container name
Same thing with RAM
I understand that Prometheus scrape metrics from kube-eagle pod, but i think that need add to kube-agle labels with actual pod names where container is started.
What you think about this?
So far so good. In Prometheus data is inserted OK. But in grafana not sure what is node_pool
what to set it. It is just an empty field for me with a comma.
Cna you guide me, please?
Thx a lot!
InitContainers aren't listed on the exposed metrics.
Quick debugging showed that the go client lists init container statuses separately from the other containers' statuses.
I created the kube-eagle with your helm chart.
I have a Prometheus operator created with stable/prometheus-operator chart.
The pod logs :
{"level":"info","msg":"Listening on 0.0.0.0:8080","time":"2019-03-04T13:00:35Z"}
{"level":"info","msg":"Creating InCluster config to communicate with Kubernetes master","time":"2019-03-04T13:00:35Z"}
When I port-forward : kubectl port-forward kube-eagle-69c44869d7-qw7sr 8080:8080
http://localhost:8080 => error 404
http://localhost:8080/health => HTTP 200, text is "Ok"
When I look into my Prometheus, I don't have any metric labeled "eagle_*"
Do I have to add some target in my Prometheus to scrape the kube-eagle pod ?
Hi,
We're facing an issue when pod is constantly restarting:
NAME READY STATUS RESTARTS AGE
kube-eagle-6b6c46d47d-pjbzl 1/1 Running 98 3d19h
And the logs are full of the following messages:
2019/08/19 06:40:08 http: superfluous response.WriteHeader call from github.com/prometheus/client_golang/prometheus/promhttp.(*responseWriterDelegator).WriteHeader (delegator.go:59)
Any ideas what might be wrong?
Deployment YAML example:
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
labels:
app: kube-eagle
name: kube-eagle
spec:
replicas: 1
selector:
matchLabels:
app: kube-eagle
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
annotations:
prometheus.io/path: /metrics
prometheus.io/port: "8080"
prometheus.io/scrape: "true"
labels:
app: kube-eagle
spec:
containers:
- env:
- name: LOG_LEVEL
value: info
image: quay.io/google-cloud-tools/kube-eagle:1.1.0
livenessProbe:
failureThreshold: 3
httpGet:
path: /health
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: kube-eagle
ports:
- containerPort: 8080
name: http
protocol: TCP
readinessProbe:
failureThreshold: 3
httpGet:
path: /health
port: http
scheme: HTTP
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
resources:
limits:
cpu: "2"
memory: 1Gi
requests:
cpu: "1"
memory: 512Mi
serviceAccount: sa-kube-eagle
serviceAccountName: sa-kube-eagle
Any ideas how to fix this?
Sometimes I get insufficient pods on monitored clusters.
For this is great to also have a running pods per node / max pods per node.
I managed to add such information by editing "Node CPU" panel and add the following:
sum (label_replace(kubelet_running_pod_count{instance=~"$node_pool.*", instace=~"$node.*"}, "node", "$1", "instance", "(.*)")) by (node) / sum (kube_node_status_allocatable_pods{node=~"$node_pool.*", node=~"$node.*"}) by (node)
It would be great to have a panel or within this panel "running pods", "max pods" , "running/max pods"
I am new to Prometheus Operator. So perhaps that will explain my confusion. I deployed kube-eagle via helm and enabled Service Monitor. I assume I need to add a scrape config to Prometheus (helm kube-prometheus-stack). They are all in the same monitoring namespace. For some reason I thought given the service monitor is in the same namespace, Prometheus would pickup it up and start scraping.
Is there an example scrape config?
Looking for standard yaml deployment.
In Google Cloud each vCPU and each GB of RAM have a fixed price. Kube Eagle can easily aggregate the allocatable and in use CPU & RAM and add a pricing metric for that.
This way one could get an overview how expensive a namespace or deployment is and what saving potential it has (usage compared to allocatable resources).
Challenges:
Resource requests and limits falsify the aggregated node metrics.
Nice work! Very useful. But it looks like for cpu the capacity is reported instead of the allocatable.
Currently they are showing up in the Container CPU
and Container RAM
tables.
This is causing me a serious issue as I'm forced to check dozens of terminated containers to find the/a active container when you have containers that cycle frequently.
EDIT: I'm not sure if this makes sense at all, I think I can deal with it. What do you guys think?
If node has some reserved resources, its allocatable cpu cores differs from capacity cpu cores (total node cpu cores).
eagle_node_resource_allocatable_cpu_cores should return allocatable cpu cores instead of total cores count.
It is a useful feature. From metrics api we can get all resources included GPU(if registerd).
K8s 1.16 has removed deprecated versions of api's.
Do you have any plan to make it work for 1.16 too ?
helm install --name=kube-eagle kube-eagle/kube-eagle
Error: validation failed: unable to recognize "": no matches for kind "Deployment" in version "apps/v1beta2"
I see there is no data in collectors yet.
I checked Prometheus
and there are no labels eagle_scrape_collector_duration_seconds
.
I guess not yet added?
Hello,
I've just tried to install kube-eagle on my cluster with Helm, and I've an Unauthorized in logs:
{"level":"warning","msg":"Failed to get podMetricses from KubernetesUnauthorized","time":"2019-03-15T14:01:05Z"}
{"level":"error","msg":"Collector 'container_resources' failed after 0.456981s: Unauthorized","time":"2019-03-15T14:01:05Z"}
{"level":"warning","msg":"Failed to get podList from KubernetesUnauthorized","time":"2019-03-15T14:01:05Z"}
{"level":"error","msg":"Collector 'node_resource' failed after 0.494984s: Unauthorized","time":"2019-03-15T14:01:05Z"}
kube version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.0", GitCommit:"0ed33881dc4355495f623c6f22e7dd0b7632b7c0", GitTreeState:"clean", BuildDate:"2018-09-28T15:20:58Z", GoVersion:"go1.11", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.5", GitCommit:"51dd616cdd25d6ee22c83a858773b607328a18ec", GitTreeState:"clean", BuildDate:"2019-01-16T18:14:49Z", GoVersion:"go1.10.7", Compiler:"gc", Platform:"linux/amd64"}
Thanks for your help
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.