Comments (22)
Thanks for the extra info @lots0logs . I'll see if I can reproduce it on a k8s cluster and get to the bottom of why these errors are popping up for you.
from do-agent.
@lots0logs I was not able to reproduce on it a k8s cluster either. Can you try adding --web.listen
in the systemd unit file (in the ExecStart line). e.g. ExecStart=/opt/digitalocean/bin/do-agent --web.listen --syslog
. After doing a systemctl daemon-reload
and a systemctl restart do-agent
, you should be able to do a curl localhost:9100
and get the raw metrics that are being scraped. I would be curious to see if that is somehow having duplicate entries for the metrics in your original log.
from do-agent.
Is your 20.04 image the stock DO image or a custom 20.04 image?
…
On Fri, Oct 30, 2020, 5:15 AM plutocrat @.***> wrote: Confirmed here. Running 4 Ubuntu 20.04 droplets. On all 4 do-agent cpu is around 95% all the time. Tried the --web.listen instruction, but apparently nothing listening on that port when I do. Installed version: 3.7.1 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#233 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXDLP3EWU6SM2GGR2YRJQDSNJ74XANCNFSM4SQ6RVVQ .Both affected droplets were built from the stock DO Ubuntu 20.04 LTS image with "Monitoring" option checked at the DO dashboard https://cloud.digitalocean.com/droplets/new.
Thanks. I will try to reproduce from that .
from do-agent.
I couldn't reproduce on a myriad of 20.04 droplets either, but I went ahead and made a new beta release that disables the collection of /boot
mountpoints. If some of you would give it a try to see if it now works on your specific droplets that would be fantastic. You can install it via curl -SsL https://repos.insights.digitalocean.com/install.sh | sudo BETA=1 bash
. Please let me know if that fixes your issues. cc @UnKnoWn-Consortium @lots0logs @plutocrat
from do-agent.
@bsnyder788 The 3.8.0 pre-release you have just made seems to have fixed the issue. At least it is no longer spamming those two error messages and taking a whole lot of CPU resources.
from do-agent.
24 hours later, and its still OK.
Note: if you've been affected by this issue you might want to clean out your systemctl logs. Just got rid of 3.5 Gb of spam from mine using "/bin/journalctl --vacuum-size=500M". Your mileage may vary: there may be more subtle ways to remove the logs from just do-agent, although I haven't found them.
from do-agent.
Sorry for the delayed response. We had a hurricane here and I was without power for a few days. I'm glad to see that y'all were able to identify the problem and implement a fix! Thanks!!
from do-agent.
3.8.0 is officially released. I am going to close this. Please open a new issue if you see anything similar in the future. Thanks!
from do-agent.
@lots0logs I can't reproduce this with just spinning up a 20.04 droplet with the agent - it seems to be working just fine on the 10, 20.04 droplets I just spun up (and there is no spammy logs hitting the journal). We did have a similar report (#228) - but in that case the user was using DOKS and it was with an alpha version of kube-state-metrics. I notice your logs also say "k8s-cluster-stage..", do you have kube-state-metrics installed? If so, what version of kube-state-metrics?
from do-agent.
@bsnyder788 Yeah looks like I have rancher/coreos-kube-state-metrics:v1.9.5
container running on one of my nodes. Though the issue I described happens on all nodes.
from do-agent.
I have encountered the exactly same issue (even the systemd message is the same) with do-agent on two droplets running Ubuntu 20.04.1. One was set up yesterday while the other one has been in use for just a month or so. I have to stop the do-agent service.
P.S. I am not running Kubernetes on the two affected droplets.
from do-agent.
@lots0logs I was not able to reproduce on it a k8s cluster either. Can you try adding
--web.listen
in the systemd unit file (in the ExecStart line). e.g.ExecStart=/opt/digitalocean/bin/do-agent --web.listen --syslog
. After doing asystemctl daemon-reload
and asystemctl restart do-agent
, you should be able to do acurl localhost:9100
and get the raw metrics that are being scraped. I would be curious to see if that is somehow having duplicate entries for the metrics in your original log.
Following your guide, I am able to get the following:
2 error(s) occurred:
* collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values
* collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
from do-agent.
Confirmed here. Running 4 Ubuntu 20.04 droplets. On all 4, do-agent cpu is running at around 95% all the time.
Tried the --web.listen instruction, but apparently nothing listening on that port when I do.
I can also confirm the error messages in /var/log/syslog and journalctl -xe
Installed version: 3.7.1
For now I've just got rid of it with apt purge do-agent
from do-agent.
@lots0logs I was not able to reproduce on it a k8s cluster either. Can you try adding
--web.listen
in the systemd unit file (in the ExecStart line). e.g.ExecStart=/opt/digitalocean/bin/do-agent --web.listen --syslog
. After doing asystemctl daemon-reload
and asystemctl restart do-agent
, you should be able to do acurl localhost:9100
and get the raw metrics that are being scraped. I would be curious to see if that is somehow having duplicate entries for the metrics in your original log.Following your guide, I am able to get the following:
2 error(s) occurred: * collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values * collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
When you do the curl localhost:9100
what is the raw output?
from do-agent.
from do-agent.
@lots0logs I was not able to reproduce on it a k8s cluster either. Can you try adding
--web.listen
in the systemd unit file (in the ExecStart line). e.g.ExecStart=/opt/digitalocean/bin/do-agent --web.listen --syslog
. After doing asystemctl daemon-reload
and asystemctl restart do-agent
, you should be able to do acurl localhost:9100
and get the raw metrics that are being scraped. I would be curious to see if that is somehow having duplicate entries for the metrics in your original log.Following your guide, I am able to get the following:
2 error(s) occurred: * collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values * collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
When you do the
curl localhost:9100
what is the raw output?
The quoted part was exactly what I got when I did curl localhost:9100
.
from do-agent.
Is your 20.04 image the stock DO image or a custom 20.04 image?
…
On Fri, Oct 30, 2020, 5:15 AM plutocrat @.***> wrote: Confirmed here. Running 4 Ubuntu 20.04 droplets. On all 4 do-agent cpu is around 95% all the time. Tried the --web.listen instruction, but apparently nothing listening on that port when I do. Installed version: 3.7.1 — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#233 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAXDLP3EWU6SM2GGR2YRJQDSNJ74XANCNFSM4SQ6RVVQ .
Both affected droplets were built from the stock DO Ubuntu 20.04 LTS image with "Monitoring" option checked at the DO dashboard https://cloud.digitalocean.com/droplets/new.
from do-agent.
@lots0logs I was not able to reproduce on it a k8s cluster either. Can you try adding
--web.listen
in the systemd unit file (in the ExecStart line). e.g.ExecStart=/opt/digitalocean/bin/do-agent --web.listen --syslog
. After doing asystemctl daemon-reload
and asystemctl restart do-agent
, you should be able to do acurl localhost:9100
and get the raw metrics that are being scraped. I would be curious to see if that is somehow having duplicate entries for the metrics in your original log.Following your guide, I am able to get the following:
2 error(s) occurred: * collected metric "node_filesystem_size_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:1.09422592e+08 > } was collected before with the same name and label values * collected metric "node_filesystem_free_bytes" { label:<name:"device" value:"/dev/vda15" > label:<name:"fstype" value:"vfat" > label:<name:"mountpoint" value:"/boot/efi" > gauge:<value:9.9854336e+07 > } was collected before with the same name and label values
When you do the
curl localhost:9100
what is the raw output?The quoted part was exactly what I got when I did
curl localhost:9100
.
Ok, thanks. I wanted to make sure that was all the info we could discern.
from do-agent.
@bsnyder788 The 3.8.0 pre-release you have just made seems to have fixed the issue.
Thank you so much for testing it out @UnKnoWn-Consortium, that is great that it is helping out. I will leave this release in beta over the weekend and check in early next week to make sure no other regressions or issues have shown themselves to you by then and if all is good, I will promote 3.8.0 to stable.
from do-agent.
@bsnyder788 The 3.8.0 pre-release you have just made seems to have fixed the issue.
Thank you so much for testing it out @UnKnoWn-Consortium, that is great that it is helping out. I will leave this release in beta over the weekend and check in early next week to make sure no other regressions or issues have shown themselves to you by then and if all is good, I will promote 3.8.0 to stable.
Okay I will keep an eye out and see if anything goes astray with it (I seriously hope not). Have a nice weekend btw.
from do-agent.
Also confirming DO stock Ubuntu 20.04 build.
Have installed the beta release on one of the four affected boxes, and its showing healthy, near-zero CPU. Thanks. Will monitor.
from do-agent.
Thanks all! I'm going to go ahead and release 3.8.0 on the stable branch as well.
from do-agent.
Related Issues (20)
- do-agent on FreeBSD does build but not run HOT 3
- do-agent on FreeBSD exports only CPU, load, disk usage and bandwidth metric HOT 16
- Agent incorrectly calculates memory usage HOT 6
- do-agent --collector.<name> flags are confusing HOT 10
- Support for Rocky Linux HOT 1
- do-agent does not start on CentOS v7 HOT 6
- Package gnupg2 not available on Ubuntu20.04 and superior HOT 2
- server certificate verification failed HOT 2
- do-agent logging into /var/log/kern.log HOT 2
- Some files have wrong UID HOT 5
- Please provide a systemd timer as alternative to the crontab entry HOT 3
- Ubuntu 22.04: apt-key is deprecated warning HOT 8
- Updates change `/etc/passwd` HOT 1
- New Issue Test
- RSA/SHA1 signature not allowed to be used on RHEL 9 HOT 2
- Key is stored in legacy trusted keyring HOT 2
- gpg: no valid OpenPGP data found. HOT 5
- test-issue
- Export host ssh key for use with doctl compute ssh HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from do-agent.