Giter VIP home page Giter VIP logo

kube-eagle's People

Contributors

krystianity avatar mumrau avatar weeco avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

kube-eagle's Issues

Allow custom labels

I was thinking about this custom labels definition. For example on DigitalOcean we have node pool name in nodes label as:

screen shot 2019-03-04 at 14 48 58

Now, this can be used as node pool in dashboard directly.

Is there any way todo this ?

Thx, great work!!

grafana dashboards full of "Test data: random walk"

Hey,

I'm running a cluster on EKS (so no nodepools) and have metrics server and kube-eagle installed which are exposing the metrics correctly.

Now on importing the dashboard in grafana (we're at 6.1.2) I get all the dashboards full of random walk data:

image

Drilling into the panel I only see it filled with the random walk queries:

image

Looking into the json model again for exporting the targets are set correctly:

"targets": [
        {
          "expr": "sum(eagle_node_resource_usage_memory_bytes{node=~\"$node_pool.*\", node=~\"$node.*\"}) / sum(eagle_node_resource_allocatable_memory_bytes{node=~\"$node_pool.*\", node=~\"$node.*\"})",
          "format": "time_series",
          "instant": true,
          "intervalFactor": 1,
          "refId": "A"
        }
      ],

Do you have any clue what's going on? I also tried the dance with the node pool variable, but that has no effect on the random data.

EDIT: okay it seems that in each of the panel the datasource is missing:

"datasource": "${DS_PROMETHEUS}",

I manually added it there for each panel and it imports just fine :)

Metrics resource usage CPU and RAM all zeros

My prometheus reports metrics like this for all the nodes equals to 0.

Any reasons?

eagle_node_resource_usage_cpu_cores{endpoint="http",instance="100.126.64.40:8080",job="kube-eagle",namespace="monitoring",node="ip-172-20-95-140.ec2.internal",pod="kube-eagle-d4c4bbf9f-4vgtx",service="kube-eagle"}

Add metrics sharding feature

Currently there is no way to shard metrics. It could be possible to implement something similar to this PR. It would be enough to have static sharding for the starters.

What do you think? Could I raise a PR for this feature?

Lot's of pod restarts

I noticed there are lots of restarts.

5m14s   Warning   Unhealthy          Pod           Liveness probe failed: Get http://10.244.2.95:8080/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
10m     Warning   Unhealthy          Pod           Readiness probe failed: Get http://10.244.2.95:8080/health: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
22m     Warning   Unhealthy          Pod           Readiness probe failed: Get http://10.244.2.95:8080/health: EOF
22m     Normal    Killing            Pod           Killing container with id docker://kube-eagle:Container failed liveness probe.. Container will be killed and recreated.

Do I need to change smth for health check params to make it work properly?

[feature/bug ?] Autoscaling and new node pool metrics

I have just install kube-eagle and i like it a lot, however when I spaned a new node pool ( with autoscaling or manualy) kube eagle exporter wont install itself on it. In comparason the kube-exporter that deploy prometheus-operator automaticely do it on new node pool.

Is it a bug in my deployment configuration ? Or is it the normal case-utilisation and if it is how can I set up kube eagle so that it scrape my new node pool ?

Cache of old pods

Hi!!

Amazing stuff you got there.
I have a question/problem.

When there's a pod dying and being replaced, there's still data in eagle and prometheus is still pushing it to my grafana so it's misleading how much resources is being used.

Is there a setting somewhere in eagle or if it's in prometheus where data is stored and for how long?

Thanks
Math

Issue deploying Kube-Eagle

After deploying Kube-Eagle using the helm chart, I got the following logs:

{"level":"info","msg":"Starting kube eagle v1.1.0","time":"2019-03-27T20:08:20Z"}
{"level":"info","msg":"Creating InCluster config to communicate with Kubernetes master","time":"2019-03-27T20:08:20Z"}
{"level":"info","msg":"Listening on 0.0.0.0:8080","time":"2019-03-27T20:08:20Z"}
{"level":"warning","msg":"Failed to get podMetricses from Kubernetesthe server could not find the requested resource (get pods.metrics.k8s.io)","time":"2019-03-27T20:09:19Z"}
{"level":"error","msg":"Collector 'container_resources' failed after 0.051743s: the server could not find the requested resource (get pods.metrics.k8s.io)","time":"2019-03-27T20:09:19Z"}
{"level":"warning","msg":"Failed to get podList from Kubernetesthe server could not find the requested resource (get nodes.metrics.k8s.io)","time":"2019-03-27T20:09:19Z"}

[...]

Probably something simple, didn't look too much into it. Maybe a permission issue ? (even though the clusterRoles and everything for Kube-Eagle seem correctly configured)

Can I just get the yaml?

We recommend using our provided helm chart to deploy kube eagle in your cluster:

Why add all that complexity when a clean k8s yaml file will work [better]? Sed is a superior "templating engine" compared to Helm. Can I get the real yaml? I tried downloading the chart and using helm template but it just gives a gzip error.

incorrect data in POD column in CPU/RAM tables on dashboard

In column POD i see only pod for kube-eagle. How i can determine what this is container if i have multiple pods with same container name
image
Same thing with RAM
image

I understand that Prometheus scrape metrics from kube-eagle pod, but i think that need add to kube-agle labels with actual pod names where container is started.

What you think about this?

What is node pool and how to set it

So far so good. In Prometheus data is inserted OK. But in grafana not sure what is node_pool what to set it. It is just an empty field for me with a comma.

Cna you guide me, please?

Thx a lot!

InitContainers are not listed

InitContainers aren't listed on the exposed metrics.

Quick debugging showed that the go client lists init container statuses separately from the other containers' statuses.

No data / metrics in Prometheus, error 404 for localhost:8080

I created the kube-eagle with your helm chart.

I have a Prometheus operator created with stable/prometheus-operator chart.

The pod logs :

{"level":"info","msg":"Listening on 0.0.0.0:8080","time":"2019-03-04T13:00:35Z"}
{"level":"info","msg":"Creating InCluster config to communicate with Kubernetes master","time":"2019-03-04T13:00:35Z"}

When I port-forward : kubectl port-forward kube-eagle-69c44869d7-qw7sr 8080:8080

http://localhost:8080 => error 404
http://localhost:8080/health => HTTP 200, text is "Ok"

When I look into my Prometheus, I don't have any metric labeled "eagle_*"

Do I have to add some target in my Prometheus to scrape the kube-eagle pod ?

Pod constantly restarting

Hi,

We're facing an issue when pod is constantly restarting:

NAME                                                   READY   STATUS    RESTARTS   AGE
kube-eagle-6b6c46d47d-pjbzl                            1/1     Running   98         3d19h

And the logs are full of the following messages:

2019/08/19 06:40:08 http: superfluous response.WriteHeader call from github.com/prometheus/client_golang/prometheus/promhttp.(*responseWriterDelegator).WriteHeader (delegator.go:59)

Any ideas what might be wrong?

Deployment YAML example:

apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  labels:
    app: kube-eagle
  name: kube-eagle
spec:
  replicas: 1
  selector:
    matchLabels:
      app: kube-eagle
  strategy:
    rollingUpdate:
      maxSurge: 25%
      maxUnavailable: 25%
    type: RollingUpdate
  template:
    metadata:
      annotations:
        prometheus.io/path: /metrics
        prometheus.io/port: "8080"
        prometheus.io/scrape: "true"
      labels:
        app: kube-eagle
    spec:
      containers:
      - env:
        - name: LOG_LEVEL
          value: info
        image: quay.io/google-cloud-tools/kube-eagle:1.1.0
        livenessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: http
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        name: kube-eagle
        ports:
        - containerPort: 8080
          name: http
          protocol: TCP
        readinessProbe:
          failureThreshold: 3
          httpGet:
            path: /health
            port: http
            scheme: HTTP
          periodSeconds: 10
          successThreshold: 1
          timeoutSeconds: 5
        resources:
          limits:
            cpu: "2"
            memory: 1Gi
          requests:
            cpu: "1"
            memory: 512Mi
      serviceAccount: sa-kube-eagle
      serviceAccountName: sa-kube-eagle

Any ideas how to fix this?

[Feature Request] Running Pods/Max Pods per node

Sometimes I get insufficient pods on monitored clusters.

For this is great to also have a running pods per node / max pods per node.

I managed to add such information by editing "Node CPU" panel and add the following:

sum (label_replace(kubelet_running_pod_count{instance=~"$node_pool.*", instace=~"$node.*"}, "node", "$1", "instance", "(.*)")) by (node) / sum (kube_node_status_allocatable_pods{node=~"$node_pool.*", node=~"$node.*"}) by (node)

Screenshot 2019-04-07 at 23 24 49

It would be great to have a panel or within this panel "running pods", "max pods" , "running/max pods"

Prometheus Scrape Config Example

I am new to Prometheus Operator. So perhaps that will explain my confusion. I deployed kube-eagle via helm and enabled Service Monitor. I assume I need to add a scrape config to Prometheus (helm kube-prometheus-stack). They are all in the same monitoring namespace. For some reason I thought given the service monitor is in the same namespace, Prometheus would pickup it up and start scraping.

Is there an example scrape config?

Dashboard Fails to Import

Grafana 8.3.5

When I import using the 9871 ID the panels below the top row don't load and when I edit the panels it shows all the queries as datasource Grafana and random walk.

image
image

Add feature to expose pricing

In Google Cloud each vCPU and each GB of RAM have a fixed price. Kube Eagle can easily aggregate the allocatable and in use CPU & RAM and add a pricing metric for that.

This way one could get an overview how expensive a namespace or deployment is and what saving potential it has (usage compared to allocatable resources).

Challenges:

  • How to define the CPU / RAM pricing? Each zone, provider and machine type (e. g. spot instances, on demand, preemptible, commitment) may have a slightly different CPU & RAM pricing

allocatable vs. capacity

Nice work! Very useful. But it looks like for cpu the capacity is reported instead of the allocatable.

Filter terminated containers

Currently they are showing up in the Container CPU and Container RAM tables.

This is causing me a serious issue as I'm forced to check dozens of terminated containers to find the/a active container when you have containers that cycle frequently.

EDIT: I'm not sure if this makes sense at all, I think I can deal with it. What do you guys think?

Collectors don't have data yet

I see there is no data in collectors yet.

I checked Prometheus and there are no labels eagle_scrape_collector_duration_seconds.

I guess not yet added?

Add option to exclude completed pods?

Hi,

We're facing an issue when we have lots of completed pods:

# kubectl get pods --all-namespaces | grep -i completed | wc -l
   11863

And kube-eagle struggles to iterate over that many objects and therefore Kubernetes control plane crashes:

Screenshot 2020-03-18 at 14 14 22

Is there a way to exclude pods in completed state?

Unauthorized

Hello,

I've just tried to install kube-eagle on my cluster with Helm, and I've an Unauthorized in logs:

{"level":"warning","msg":"Failed to get podMetricses from KubernetesUnauthorized","time":"2019-03-15T14:01:05Z"}
{"level":"error","msg":"Collector 'container_resources' failed after 0.456981s: Unauthorized","time":"2019-03-15T14:01:05Z"}
{"level":"warning","msg":"Failed to get podList from KubernetesUnauthorized","time":"2019-03-15T14:01:05Z"}
{"level":"error","msg":"Collector 'node_resource' failed after 0.494984s: Unauthorized","time":"2019-03-15T14:01:05Z"}

kube version

Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.0", GitCommit:"0ed33881dc4355495f623c6f22e7dd0b7632b7c0", GitTreeState:"clean", BuildDate:"2018-09-28T15:20:58Z", GoVersion:"go1.11", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.5", GitCommit:"51dd616cdd25d6ee22c83a858773b607328a18ec", GitTreeState:"clean", BuildDate:"2019-01-16T18:14:49Z", GoVersion:"go1.10.7", Compiler:"gc", Platform:"linux/amd64"}

Thanks for your help

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.