Made with Material for MkDocs
stefanprodan / swarmprom Goto Github PK
View Code? Open in Web Editor NEWDocker Swarm instrumentation with Prometheus, Grafana, cAdvisor, Node Exporter and Alert Manager
License: MIT License
Docker Swarm instrumentation with Prometheus, Grafana, cAdvisor, Node Exporter and Alert Manager
License: MIT License
At the Part of Prometheus service discovery, the names configured for DNS discovery are formed as tasks.<servicename>
:
scrape_configs:
- job_name: 'node-exporter'
dns_sd_configs:
- names:
- 'tasks.node-exporter'
type: 'A'
port: 9100
The names
should be <domain_name>. I have no idea of how does the form of tasks.<servicename>
come from. Is it comes from your DNS configuration or docker swarm mode discovery?
Hi,
My cluster has 2 nodes, 1 manager and 1 worker.
In the swarm node dashboard I can see details for all the nodes (except for CPU usage for both the nodes, is it normal?)
In the swarm services dashboard, I 'm only seeing details from my worker node. When I explicitly select the master node, I don't see anything. As if it's not reading anything from my master.
Have you ever tried creating a rule like if the node went down then it will throw an alert?
I have a 4 node swarm and the service dashboard show only the service from the manager.
Also, it say I have only 1 node.
But if I go to the node dashboard I can see all my 4 nodes.
In the current stack the node-exporter services cannot capture the network traffic stats since they aren't attached to the host network.
If one does switch to use the host network then it works fine again but Prometheus cannot discover the exporters anymore.
Is there a way to support both discover and host networking or i have to choose between the two features when using this stack?
when I log into :3000 at first I get a templating error ...
Templating init failed
[object Object]
api/datasources/proxy/1/api/v1/query_range?query=sum(irate(node_cpu%7Bmode%3D%22idle%22%7D%5B30s%5D)%20*%20on(instance)%20group_left(node_name)%20node_meta%7Bnode_id%3D~%22.%2B%22%7D)%20*%20100%20%2F%20count_scalar(node_cpu%7Bmode%3D%22user%22%7D%20*%20on(instance)%20group_left(node_name)%20node_meta%7Bnode_id%3D~%22.%2B%22%7D)%20&start=1512715040&end=1512715100&step=1
All values in this command will be ignored:
ADMIN_USER=admin \
ADMIN_PASSWORD=admin \
SLACK_URL=https://hooks.slack.com/services/TOKEN \
SLACK_CHANNEL=devops-alerts \
SLACK_USER=alertmanager \
docker stack deploy -c docker-compose.yml mon
Ref. e.g. "The same effect occurs without the env_file: .env line, or with "$FOOVAR" in the actual command.
Tested on this docker:
Client:
Version: 17.12.0-ce
API version: 1.35
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:11:19 2017
OS/Arch: linux/amd64
Server:
Engine:
Version: 17.12.0-ce
API version: 1.35 (minimum version 1.12)
Go version: go1.9.2
Git commit: c97c6d6
Built: Wed Dec 27 20:09:53 2017
OS/Arch: linux/amd64
Experimental: true
Does not work
Hi
Thank you Stephan for this work. I have a question about grafana. I try to access it at "127.0.0.1:3000" but it gives me this page
I'm not able to access dashboards, or any other thing from grafana? I'm not sure what I did wrong?
One other question please, what should I do to access collected metric values programmatically in python? should I use a specific library? or should I forward collected metrics to a database, then access it from python?
Regards
Is there a way to automatically exclude all the containers and services series that are related to monitoring itself? All that starts with mon_*
PS. Thanks for putting together this awesome dashboard.
Caddy is free only for personal projects https://caddyserver.com/products/licenses
Generally speaking, SwarmProm is a great starting point. One issue we're running into implementing this solution, however, is that (at this point) there is no way to extend prometheus metrics to other things.
For instance, we would like to monitor Traefik (BTW as an Aside, you should look at replacing Caddy with Traefik in your stack...In my opinion, it's an easier to configure traffic router than Caddy, with less random config files...YMMV) with Prometheus.
However, when I go to pull prometheus.yml out (create a docker config file for it, add that config into the monitoring stack file) upon starting prometheus we're getting:
"mv: can't rename '/tmp/prometheus.yml': Device or resource busy"
Meaning prometheus appears to already be running by the time Docker attempts to mount the prometheus.yml file into /etc/prometheus.
The only way to add to the scrape configs at this point is to download your Dockerfile / prometheus.yml file and re-build the prometheus container...so the prometheus included in this stack cannot really be extended to monitor other things.
Help a guy out? There's got to be a way to externalize the prometheus.yml file so that it can come in from docker configs (like the rules files do).
Good day. And thanks for the great project. I really admire this one.
I run your stack on cluster with 1 manager and 2 workers. Everything looks good, but in Prometheus dashboard I see the next one:
As you write here, I update /etc/docker/daemon.json
and restart docker service:
{
"experimental": true,
"metrics-addr": "0.0.0.0:9323"
}
I check my DOCKER_GWBRIDGE_IP
:
$ ip -o addr show docker_gwbridge
3: docker_gwbridge inet 172.18.0.1/16 brd 172.18.255.255 scope global docker_gwbridge\ valid_lft forever preferred_lft forever
If I curl
this endpoint with next IPs, everything works:
$ curl http://172.18.0.1:9323/metrics
$ curl http://0.0.0.0:9323/metrics
$ curl http://localhost:9323/metrics
But in Prometheus dockerd-exporter
statuses are always down.
$ docker service logs mon_dockerd-exporter
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | Activating privacy features... done.
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | http://:9323
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | 03/Apr/2018:07:36:34 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | 03/Apr/2018:07:36:49 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | 03/Apr/2018:07:37:04 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | 03/Apr/2018:07:37:19 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | 03/Apr/2018:07:37:34 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.ofok9t4isfk9@node-1 | 03/Apr/2018:07:37:49 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | Activating privacy features... done.
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | http://:9323
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | 03/Apr/2018:07:36:37 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | 03/Apr/2018:07:36:52 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | 03/Apr/2018:07:37:07 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | 03/Apr/2018:07:37:22 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | 03/Apr/2018:07:37:37 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.rfzakmu3h1ml@node-2 | 03/Apr/2018:07:37:52 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | Activating privacy features... done.
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | http://:9323
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | 03/Apr/2018:07:36:36 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | 03/Apr/2018:07:36:51 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | 03/Apr/2018:07:37:06 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | 03/Apr/2018:07:37:21 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | 03/Apr/2018:07:37:36 +0000 [ERROR 502 /metrics] context canceled
mon_dockerd-exporter.0.z9ud5s9c2u2s@node-3 | 03/Apr/2018:07:37:51 +0000 [ERROR 502 /metrics] context canceled
Swarmprom is sucessfully running on Ubuntu Machine.
Currentl it is not showing worker nodes
Kindly assist me
Hi !
the alertmanager container is stuck in an endless loop of starting and exiting straight away. These are the logs that I can get from a container:
time="2017-10-02T15:13:34Z" level=info msg="Starting alertmanager (version=0.8.0, branch=HEAD, revision=74e7e48d24bddd2e2a80c7840af9b2de271cc74c)" source="main.go:109"
time="2017-10-02T15:13:34Z" level=info msg="Build context (go=go1.8.3, user=root@439065dc2905, date=20170720-14:14:06)" source="main.go:110"
time="2017-10-02T15:13:34Z" level=info msg="Loading configuration file" file="/etc/alertmanager/alertmanager.yml" source="main.go:234"
time="2017-10-02T15:13:34Z" level=error msg="Loading configuration file failed: no global Slack API URL set" file="/etc/alertmanager/alertmanager.yml" source="main.go:237"
I've set the env variables for each of these ADMIN_USER, ADMIN_PASSWORD, SLACK_URL, SLACK_CHANNEL, SLACK_USER and I don't know what else to do to make this to work properly.
Hi Stefan,
Need you advise please to understand how to monitor if a container is not running (the reasons could be someone deleted the container, crashed, etc etc).
Eg. I 've a kafka cluster with 3 zookeeper nodes and 3 kafka nodes. I want to be altered if any of the kafka or zookeeper node goes down or is not responding.
Since your setup I can't put additional configs in Prometheus.yml, how can I create such rules with the rules file?
Hi,
the caddy server is not starting, when I do a docker service ls, all the services I see as started with caddy only having replica as 0/1.
I did inspect and its doesn't show any error or even no logs o/p too from the container. When I remove the stack and redeploy it, sometimes the health of the caaddy container is starting and sometimes it's unhealthy.
I 'm running this on a Ubuntu 16-04 node with latest docker version.
Wow, I'm surprise to see this stack. I was think about migrating your dockprom project :)
As I see you are working actively on it. Do you consider it ready for other folks to try it?
I have tried to use Prometheus to monitor two docker swarms together refer to your swarmprom guide.
Since Prometheus is not in the same overlay network with the monitored nodes, I tried to use static_config instead of dns_sd_configs:
job_name: 'prometheus'
static_configs:
job_name: 'node-exporter'
static_configs:
node_meta from http://infbjvm223.cn.oracle.com:9100/metrics is
node_meta{container_label_com_docker_swarm_node_id="wx86gspnvhgdli8kq0k93m392",node_id="wx86gspnvhgdli8kq0k93m392",node_name="infbjvm223.cn.oracle.com"} 1
But from Prometheus console, the result of executing node_meta will show 4 metrics, mismached the instances and the node meta data:
node_meta{container_label_com_docker_swarm_node_id="n9x7iwqhqe51y80c00a5c16fd",instance="infbjvm223.cn.oracle.com:9100",job="node-exporter",node_id="n9x7iwqhqe51y80c00a5c16fd",node_name="infbjsrv35.cn.oracle.com"} | 1
node_meta{container_label_com_docker_swarm_node_id="n9x7iwqhqe51y80c00a5c16fd",instance="infbjsrv35.cn.oracle.com:9100",job="node-exporter",node_id="n9x7iwqhqe51y80c00a5c16fd",node_name="infbjsrv35.cn.oracle.com"} | 1
node_meta{container_label_com_docker_swarm_node_id="wx86gspnvhgdli8kq0k93m392",instance="infbjvm223.cn.oracle.com:9100",job="node-exporter",node_id="wx86gspnvhgdli8kq0k93m392",node_name="infbjvm223.cn.oracle.com"} | 1
node_meta{container_label_com_docker_swarm_node_id="wx86gspnvhgdli8kq0k93m392",instance="infbjsrv35.cn.oracle.com:9100",job="node-exporter",node_id="wx86gspnvhgdli8kq0k93m392",node_name="infbjvm223.cn.oracle.com"} | 1
I can not understand why this happen, and why dns_sd_configs can collect the right node metadata.
Can you help me?
The current Prometheus dashboard doesn't work since it's made for Prom 1.x
Hello,
I'm new to Docker, Prometheus and Grafana. Trying to learn the basic stuff. I followed the steps that has been said in this repository.I have no problem reaching to Grafana, Alert Manager with <swarm_ip>:xxxx, but when I try to reach Prometheus, <swarm_ip>:9090 I get a 502 Bad Gateway error. Unfortunately I couldn't find a documentation on Prometheus errors.
PS: Thanks for the great tutorial.
In my case it is possible to manually define the hosts to scrape (with hostnames) because they normally do not change.
Then I simply mapped the cAdvisor and node_exporter ports to the host machine so I can combine docker, cAdvisor and node_exporter metrics.
Is this a good, bad or ugly way?
Just an idea...
Hello,
can you help me? Why following task_high_memory_usage_1g defined task rule (default):
- alert: task_high_memory_usage_1g
expr: |
sum(container_memory_rss{container_label_com_docker_swarm_task_name=~".+"})
BY (container_label_com_docker_swarm_task_name, container_label_com_docker_swarm_node_id) > 1e+09
for: 5m
annotations:
description: '{{ $labels.container_label_com_docker_swarm_task_name }} on ''{{ $labels.container_label_com_docker_swarm_node_id }}'' memory usage is {{ humanize $value }}.'
summary: Memory alert for Swarm task '{{ $labels.container_label_com_docker_swarm_task_name }}' on '{{ $labels.container_label_com_docker_swarm_node_id }}'
Appers in Slack like below?
No description or other annotations.
task_high_cpu_usage_50 task rule appears correctly:
Thank you.
Dashboard and datasource are no longer included after login. there was no changes to the repo, just doing regular docker stack deploy
Hi I d like use Prometheus in swarm . It is not clear for me if I need to add in the composer consul installation or if consul and registrator is already present inside this bundle . In the first case is there particular setting in Prometheus to add ?
I'm new to using Prometheus and I would really appreciate some help. I've been looking into this issue for quite a bit. I have a swarm of machines with 1 manager and 7 workers. The manager is on a digital ocean instance and the workers are physical machines on my local network.
The problem is when I go to the Grafana dashboard only 1 node is being detected. When I visit the prometheus targets url at port 9090, I see 8 endpoints but only 1 is up. The rest have an error that says "context_deadline_exceeded".
On each machine, I have set the metrics address to 0.0.0.0:9323 and experimental mode is set to true. I have also enabled port 2376 on the machines, 7946, and 4789.
Any suggestions to get metrics for the other nodes is much appreciated. Thank you!
Please help to resolve this issue.
Using experimental and 0.0.0.0:9323 pretty much export the port to the public is there other secure way to export this, and not show it to anyone?
In current setup, we have 3 nodes and 1 master.
All nodes are visible properly on Grafana but Master is not visible Grafana.
Please help me to resolve this issu.
Thanks in advance :-)
I have deployed swarmprom on a 3 nodes cluster on Docker for AWS. All nodes are masters and are running fine but and only 2 nodes are listed in Grafana, a couple of my app stacks are also missing.
All the swarmprom services seem to run fine though.
Any hints ?
btw, thanks a lot, really great project ! ๐
I've tried this a few times, and logged in an verified the docker swarm nodes and services dashboards are present in the /etc/grafana/dashboards directory, however it never sees them for import.
When I manually import the json files, they result in completely blank dashboards.
How to do i disable the basic authentication that is now required for me to login. I understand that caddy service is responsible for authentication but i cant figure out how to disable it. Any idea?
Hi
I'm trying to store prometheus metrics in postgresql based on prometheus-postgresql-adapter. I modified the docker-compose.yml to the docker-compose-pg-old.yml.pdf (which includes 2 additional services corresponding to the first 2 containers in prometheus-postgresql-adapter, and comments out the local storage for prometheus). The prometheus.yml is modified as shown in the prometheus.yml.pdf to direct "read" and "write" to postgresql. I had to build the prometheus docker image to include the modified prometheus.yml.
The stack is deployed under the name "mon". The mon_prometheus should connect to the "mon_prometheus_postgresql_adapter", which in turn connects to mon_pg_prometheus (the postgresql database). The problem is that "mon_prometheus" service is unable to connect the "mon_prometheus_postgresql_adapter". The logs from "mon_prometheus" says:
level=error ts=2018-02-20T04:13:33.284782524Z caller=engine.go:544 component="query engine" msg="error selecting series set" err="error sending request: Post http://mon_prometheus_postgresql_adapter:9201/read: dial tcp: lookup mon_prometheus_postgresql_adapter on 127.0.0.11:53: no such host"
Regards
How to Disable basic Authentication. I understand that caddy service is responsible for authentication. How do I bypass this basic authentication. Any help? Thanks
I am using this repo to create monitoring stack for our production swarm environments.
Have made some changes in prometheus configuration
Can you please help me to fix this problem.
I could deploy all services except getting below error on prometheus container
`deb795407a (none))"
level=info ts=2018-03-07T17:07:38.10631854Z caller=main.go:228 fd_limits="(soft=1048576, hard=1048576)"
level=info ts=2018-03-07T17:07:38.109652503Z caller=main.go:502 msg="Starting TSDB ..."
level=info ts=2018-03-07T17:07:38.127573843Z caller=web.go:383 component=web msg="Start listening for connections" address=0.0.0.0:9090
level=info ts=2018-03-07T17:07:38.574693038Z caller=main.go:512 msg="TSDB started"
level=info ts=2018-03-07T17:07:38.574933556Z caller=main.go:588 msg="Loading configuration file" filename=/etc/prometheus/prometheus.yml
level=info ts=2018-03-07T17:07:38.578334416Z caller=main.go:489 msg="Server is ready to receive web requests."
level=warn ts=2018-03-07T17:08:05.313728189Z caller=main.go:366 msg="Received SIGTERM, exiting gracefully..."
level=info ts=2018-03-07T17:08:05.313788495Z caller=main.go:390 msg="Stopping scrape discovery manager..."
level=info ts=2018-03-07T17:08:05.3138142Z caller=main.go:403 msg="Stopping notify discovery manager..."
level=info ts=2018-03-07T17:08:05.313828264Z caller=main.go:427 msg="Stopping scrape manager..."
level=info ts=2018-03-07T17:08:05.313855348Z caller=main.go:386 msg="Scrape discovery manager stopped"
level=info ts=2018-03-07T17:08:05.313893078Z caller=main.go:399 msg="Notify discovery manager stopped"
level=info ts=2018-03-07T17:08:05.31401654Z caller=main.go:421 msg="Scrape manager stopped"
level=info ts=2018-03-07T17:08:05.317560586Z caller=manager.go:460 component="rule manager" msg="Stopping rule manager..."
level=info ts=2018-03-07T17:08:05.317627258Z caller=manager.go:466 component="rule manager" msg="Rule manager stopped"
level=info ts=2018-03-07T17:08:05.31764061Z caller=notifier.go:493 component=notifier msg="Stopping notification manager..."
level=info ts=2018-03-07T17:08:05.317659353Z caller=main.go:573 msg="Notifier manager stopped"
level=info ts=2018-03-07T17:08:05.317714607Z caller=main.go:584 msg="See you next time!"`
`docker@manager:/Users/gaurav.goyal/gg/swarmprom/prometheus/conf$ cat prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'promswarm'
rule_files:
"swarm_node.rules.yml"
"swarm_task.rules.yml"
alerting:
alertmanagers:
static_configs:
targets:
alertmanager:9093
scrape_configs:
job_name: 'prometheus'
static_configs:
targets: ['localhost:9090']
names:
'tasks.cadvisor'
type: 'A'
port: 8080
job_name: 'node-exporter'
dns_sd_configs:
names:
'tasks.node-exporter'
type: 'A'
port: 9100
job_name: 'grafana'
dns_sd_configs:
names:
'tasks.grafana'
type: 'A'
port: 3000
FROM prom/prometheus:v2.2.0-rc.0
COPY conf/ /etc/prometheus/
#ENTRYPOINT [ "/etc/prometheus/docker-entrypoint.sh" ]
CMD [ "--config.file=/etc/prometheus/prometheus.yml",
"--storage.tsdb.path=/prometheus",
"--web.console.libraries=/usr/share/prometheus/console_libraries",
"--web.console.templates=/usr/share/prometheus/consoles" ]`
Hi,
Is possible with swarmprom monitoring HTTP status code of my web applications? I like to get a slack notification if my application don't return HTTP 200 code.
Thanks!
I'm having trouble scraping data outside of the swarm. I do not get any errors but no data shows up. Here is my prometheus.yml. Its the default file with very minor changes. Any thoughts?
global:
scrape_interval: 15s
evaluation_interval: 15s
external_labels:
monitor: 'promswarm'
rule_files:
alerting:
alertmanagers:
scrape_configs:
job_name: 'prometheus'
static_configs:
job_name: 'dockerd-exporter'
dns_sd_configs:
job_name: 'cadvisor'
dns_sd_configs:
job_name: 'node-exporter'
dns_sd_configs:
job_name: 'perforce_node_exporter'
scrape_interval: 30s
static_configs:
Been reading the README and it doesn't state WHERE we put the ADMIN vars. No file is given in the README. I see them referenced in the code but I see no place to set them and can only ASSUME we set those in bash as env variables.
But in looking at issue #2 (#2), it looks like we don't... but it doesn't state WHAT FILE to declare those in.
Can someone clarify this in documentation???
I'm trying to monitor few nodes outside the swarm cluster but unable to reach those nodes from inside the prometheus container
men if there is a better way to use this "sum(node_memory_MemAvailable * on(instance) group_left(node_id, node_name) node_meta) by (node_id, node_name)", i'll appreciate maybe some metrics_relabel thanks
Hi Stefan,
I tried to follow the "https://github.com/stefanprodan/swarmprom#monitoring-applications-and-backend-services" to monitor kafka and MySQL services using Prometheus provided exporters for these tools.
Eg. this one for MySQL
https://github.com/prometheus/mysqld_exporter
I configured this in docker-compose file
environment:
- JOBS=kafka-exporter:9308 mysql-exporter:9104
Now I can see the metrics from the web browser. But my Prometheus is not scraping any metrics from them.
So I have some confusion here.
Thanks for the help.
Regards,
Ashish
Prometheus 2 is already released. Is it going to be supported instead of current 1.8?
first thanks for this nice stack!
for some reason swarm node dashboard always shows wrong number of nodes, it is correct in services dashboard but not in swarm nodes, any idea what it could be?
Hi , First of all, thank you for the job you're doing. When the stack is deployed the access port to the web interface is 3000. How can it be changed to 80 (443 eventually)?
Thanks in advance for helping :)
When I try to monitor an application, for example Redis, I'm having issues.
My config:
*docker-compose.yml:
prometheus:
image: stefanprodan/swarmprom-prometheus
environment:
*prometheus.yml:
job_name: 'redis-exporter'
dns_sd_configs:
names:
'tasks.redis-exporter'
type: 'A'
port: 9121
*compose-redis.yml:
version: '3'
networks:
mon_net:
external: true
services:
redis:
image: redis
networks:
redis-exporter:
image: oliver006/redis_exporter
networks:
When I run the monitoring stack and then compose-redis:
Prometheus goes up and down all the time.
Log shows:
level=error ts=2018-02-19T16:49:15.594740858Z caller=main.go:582 err="Error loading config couldn't load configuration (--config.file=/etc/prometheus/prometheus.yml): parsing YAML file /etc/prometheus/prometheus.yml: unknown fields in alertmanager config: job_name"
I have no idea how to fix this or what I did wrong.
Any help would be appreciated.
Sorry for posting in the wrong place at first.
Thanks
What did you do?
Deployed swarmprom in my Swarm cluster, logged into Grafana, and noticed that the available disk space exceeds 100%
What did you expect to see?
A value lower or at most equal to 100%
What did you see instead? Under which circumstances?
171%, every time
Is it a bug in the node-exporter data?
The df -h of the first of the two nodes is:
[msadmin@MS-DSC1 ~]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda2 30G 4.1G 26G 14% /
devtmpfs 3.9G 0 3.9G 0% /dev
tmpfs 3.9G 0 3.9G 0% /dev/shm
tmpfs 3.9G 377M 3.6G 10% /run
tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup
/dev/sda1 497M 105M 392M 22% /boot
/dev/sdb1 16G 45M 15G 1% /mnt/resource
//msshare.file.core.windows.net/msshare 5.0T 8.5M 5.0T 1% /mnt/msshare
tmpfs 797M 0 797M 0% /run/user/1000
and the second is virtually identical.
Thank you,
Roberto
It would be better IMO to just copy the alertmanager.yml file into the /tmp folder in the Dockerfile and have the entrypoint perform the file modifications as a part of the copy.
If I try and add a docker config file to the path /etc/alertmanager/alertmanager.yml i get the error
mv: can't rename '/tmp/alertmanager.yml': Device or resource busy
Hello Stefan,
Great project + blog explaining the whole thing!!
Is there any chance to have this project working on a docker swarm build upon 5 raspberry pi 3 nodes?
Greetz,
Raymond
Hi ,
Running docker CE 17.12.0-ce in swarm mode with 3 nodes.
I have deployed swarmprom and everything work fine except that i have no graph in grafana swarm node dashboard.
Any idea ?
My password had caracter @ and this caused error on function grafana_api in docker-entrypoint.sh from container Grafana.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
๐ Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. ๐๐๐
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google โค๏ธ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.