Giter VIP home page Giter VIP logo

Comments (22)

bsnyder788 avatar bsnyder788 commented on July 17, 2024 1

Thanks for the extra info @lots0logs . I'll see if I can reproduce it on a k8s cluster and get to the bottom of why these errors are popping up for you.

from do-agent.

bsnyder788 avatar bsnyder788 commented on July 17, 2024 1

@lots0logs I was not able to reproduce on it a k8s cluster either. Can you try adding --web.listen in the systemd unit file (in the ExecStart line). e.g. ExecStart=/opt/digitalocean/bin/do-agent --web.listen --syslog. After doing a systemctl daemon-reload and a systemctl restart do-agent, you should be able to do a curl localhost:9100 and get the raw metrics that are being scraped. I would be curious to see if that is somehow having duplicate entries for the metrics in your original log.

from do-agent.

bsnyder788 avatar bsnyder788 commented on July 17, 2024 1

Is your 20.04 image the stock DO image or a custom 20.04 image?

On Fri, Oct 30, 2020, 5:15 AM plutocrat @.***> wrote: Confirmed here. Running 4 Ubuntu 20.04 droplets. On all 4 do-agent cpu is around 95% all the time. Tried the --web.listen instruction, but apparently nothing listening on that port when I do. Installed version: 3.7.1 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#233 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXDLP3EWU6SM2GGR2YRJQDSNJ74XANCNFSM4SQ6RVVQ .

Both affected droplets were built from the stock DO Ubuntu 20.04 LTS image with "Monitoring" option checked at the DO dashboard https://cloud.digitalocean.com/droplets/new.

Thanks. I will try to reproduce from that .

from do-agent.

bsnyder788 avatar bsnyder788 commented on July 17, 2024 1

I couldn't reproduce on a myriad of 20.04 droplets either, but I went ahead and made a new beta release that disables the collection of /boot mountpoints. If some of you would give it a try to see if it now works on your specific droplets that would be fantastic. You can install it via curl -SsL https://repos.insights.digitalocean.com/install.sh | sudo BETA=1 bash . Please let me know if that fixes your issues. cc @UnKnoWn-Consortium @lots0logs @plutocrat

from do-agent.

UnKnoWn-Consortium avatar UnKnoWn-Consortium commented on July 17, 2024 1

@bsnyder788 The 3.8.0 pre-release you have just made seems to have fixed the issue. At least it is no longer spamming those two error messages and taking a whole lot of CPU resources.

from do-agent.

plutocrat avatar plutocrat commented on July 17, 2024 1

24 hours later, and its still OK.
Note: if you've been affected by this issue you might want to clean out your systemctl logs. Just got rid of 3.5 Gb of spam from mine using "/bin/journalctl --vacuum-size=500M". Your mileage may vary: there may be more subtle ways to remove the logs from just do-agent, although I haven't found them.

from do-agent.

lots0logs avatar lots0logs commented on July 17, 2024 1

Sorry for the delayed response. We had a hurricane here and I was without power for a few days. I'm glad to see that y'all were able to identify the problem and implement a fix! Thanks!!

from do-agent.

bsnyder788 avatar bsnyder788 commented on July 17, 2024 1

3.8.0 is officially released. I am going to close this. Please open a new issue if you see anything similar in the future. Thanks!

from do-agent.

bsnyder788 avatar bsnyder788 commented on July 17, 2024

@lots0logs I can't reproduce this with just spinning up a 20.04 droplet with the agent - it seems to be working just fine on the 10, 20.04 droplets I just spun up (and there is no spammy logs hitting the journal). We did have a similar report (#228) - but in that case the user was using DOKS and it was with an alpha version of kube-state-metrics. I notice your logs also say "k8s-cluster-stage..", do you have kube-state-metrics installed? If so, what version of kube-state-metrics?

from do-agent.

lots0logs avatar lots0logs commented on July 17, 2024

@bsnyder788 Yeah looks like I have rancher/coreos-kube-state-metrics:v1.9.5 container running on one of my nodes. Though the issue I described happens on all nodes.

from do-agent.

UnKnoWn-Consortium avatar UnKnoWn-Consortium commented on July 17, 2024

I have encountered the exactly same issue (even the systemd message is the same) with do-agent on two droplets running Ubuntu 20.04.1. One was set up yesterday while the other one has been in use for just a month or so. I have to stop the do-agent service.

P.S. I am not running Kubernetes on the two affected droplets.

from do-agent.

UnKnoWn-Consortium avatar UnKnoWn-Consortium commented on July 17, 2024

@lots0logs I was not able to reproduce on it a k8s cluster either. Can you try adding --web.listen in the systemd unit file (in the ExecStart line). e.g. ExecStart=/opt/digitalocean/bin/do-agent --web.listen --syslog. After doing a systemctl daemon-reload and a systemctl restart do-agent, you should be able to do a curl localhost:9100 and get the raw metrics that are being scraped. I would be curious to see if that is somehow having duplicate entries for the metrics in your original log.

Following your guide, I am able to get the following:

2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values

from do-agent.

plutocrat avatar plutocrat commented on July 17, 2024

Confirmed here. Running 4 Ubuntu 20.04 droplets. On all 4, do-agent cpu is running at around 95% all the time.
Tried the --web.listen instruction, but apparently nothing listening on that port when I do.
I can also confirm the error messages in /var/log/syslog and journalctl -xe
Installed version: 3.7.1
For now I've just got rid of it with apt purge do-agent

from do-agent.

bsnyder788 avatar bsnyder788 commented on July 17, 2024

@lots0logs I was not able to reproduce on it a k8s cluster either. Can you try adding --web.listen in the systemd unit file (in the ExecStart line). e.g. ExecStart=/opt/digitalocean/bin/do-agent --web.listen --syslog. After doing a systemctl daemon-reload and a systemctl restart do-agent, you should be able to do a curl localhost:9100 and get the raw metrics that are being scraped. I would be curious to see if that is somehow having duplicate entries for the metrics in your original log.

Following your guide, I am able to get the following:

2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values

When you do the curl localhost:9100 what is the raw output?

from do-agent.

bsnyder788 avatar bsnyder788 commented on July 17, 2024

from do-agent.

UnKnoWn-Consortium avatar UnKnoWn-Consortium commented on July 17, 2024

@lots0logs I was not able to reproduce on it a k8s cluster either. Can you try adding --web.listen in the systemd unit file (in the ExecStart line). e.g. ExecStart=/opt/digitalocean/bin/do-agent --web.listen --syslog. After doing a systemctl daemon-reload and a systemctl restart do-agent, you should be able to do a curl localhost:9100 and get the raw metrics that are being scraped. I would be curious to see if that is somehow having duplicate entries for the metrics in your original log.

Following your guide, I am able to get the following:

2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values

When you do the curl localhost:9100 what is the raw output?

The quoted part was exactly what I got when I did curl localhost:9100.

from do-agent.

UnKnoWn-Consortium avatar UnKnoWn-Consortium commented on July 17, 2024

Is your 20.04 image the stock DO image or a custom 20.04 image?

On Fri, Oct 30, 2020, 5:15 AM plutocrat @.***> wrote: Confirmed here. Running 4 Ubuntu 20.04 droplets. On all 4 do-agent cpu is around 95% all the time. Tried the --web.listen instruction, but apparently nothing listening on that port when I do. Installed version: 3.7.1 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#233 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXDLP3EWU6SM2GGR2YRJQDSNJ74XANCNFSM4SQ6RVVQ .

Both affected droplets were built from the stock DO Ubuntu 20.04 LTS image with "Monitoring" option checked at the DO dashboard https://cloud.digitalocean.com/droplets/new.

from do-agent.

bsnyder788 avatar bsnyder788 commented on July 17, 2024

@lots0logs I was not able to reproduce on it a k8s cluster either. Can you try adding --web.listen in the systemd unit file (in the ExecStart line). e.g. ExecStart=/opt/digitalocean/bin/do-agent --web.listen --syslog. After doing a systemctl daemon-reload and a systemctl restart do-agent, you should be able to do a curl localhost:9100 and get the raw metrics that are being scraped. I would be curious to see if that is somehow having duplicate entries for the metrics in your original log.

Following your guide, I am able to get the following:

2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values

When you do the curl localhost:9100 what is the raw output?

The quoted part was exactly what I got when I did curl localhost:9100.

Ok, thanks. I wanted to make sure that was all the info we could discern.

from do-agent.

bsnyder788 avatar bsnyder788 commented on July 17, 2024

@bsnyder788 The 3.8.0 pre-release you have just made seems to have fixed the issue.

Thank you so much for testing it out @UnKnoWn-Consortium, that is great that it is helping out. I will leave this release in beta over the weekend and check in early next week to make sure no other regressions or issues have shown themselves to you by then and if all is good, I will promote 3.8.0 to stable.

from do-agent.

UnKnoWn-Consortium avatar UnKnoWn-Consortium commented on July 17, 2024

@bsnyder788 The 3.8.0 pre-release you have just made seems to have fixed the issue.

Thank you so much for testing it out @UnKnoWn-Consortium, that is great that it is helping out. I will leave this release in beta over the weekend and check in early next week to make sure no other regressions or issues have shown themselves to you by then and if all is good, I will promote 3.8.0 to stable.

Okay I will keep an eye out and see if anything goes astray with it (I seriously hope not). Have a nice weekend btw.

from do-agent.

plutocrat avatar plutocrat commented on July 17, 2024

Also confirming DO stock Ubuntu 20.04 build.
Have installed the beta release on one of the four affected boxes, and its showing healthy, near-zero CPU. Thanks. Will monitor.

from do-agent.

bsnyder788 avatar bsnyder788 commented on July 17, 2024

Thanks all! I'm going to go ahead and release 3.8.0 on the stable branch as well.

from do-agent.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.