Giter VIP home page Giter VIP logo

Comments (3)

nevinxie avatar nevinxie commented on June 4, 2024

我们需要研发一个监控插件,我建议可以叫 - wecube-monitoring
喜欢不?

from open-monitor.

zgyzgyhero avatar zgyzgyhero commented on June 4, 2024

初步想法: 监控简单架构
1 prometheus HA 双活 提供API web服务器做历史数据查询和历史告警查询
2 local storage
3 consul服务发现,不同的exporter加上tag,用于prometheus采集和提供
api给web服务器做资源搜索
4 alertmanager HA 双活 提供告警通道
5 web server做接口整合与告警配置 用mysql存储 下面是一些预想的功能点与数据依赖
- 5.1 - 资源搜索 -> consul
- 5.2 - 历史数据查询 -> prometheus
- 5.3 - 历史告获查询 -> prometheus
- 5.4 - 告警配置 -> mysql -> 管理配置文件 -> prometheus 热加载配置
- 5.5 - 告警接收人 -> wecube用户 -> mysql -> alertmanager
- 5.6 - 图展示配置 -> mysql
数据流向图:
image

from open-monitor.

zgyzgyhero avatar zgyzgyhero commented on June 4, 2024

@chaneyliu
上次说的两个问题
1 双prometheus采集是否会出现各自数据不一致,导致它们之间的告警与历史数据视图有差异?
-> 用主备,用拿来支持联邦集群的数据接口做备从主同步数据的源,这样可以只采一次数据,而且能尽可能保证主备数据的一致性,但这种方式需要做以下几点额外工作:
- 1 - 需要一个服务去check主节点的可用性,检测到不用可时要去修改配置文件(也可以是事先配置好的另一份配置,因为备节点要使用联邦接口的话就不要再去pull节点的exporter了,所以主备的配置文件是不一样的,当备升级为主时需要读过另一份配置文件),修改配置文件后要主动去reload配置或重启服务。
以上这种主备切换主要是在OS层不可用的情况下发生,如果只是应用容器层主节点不可用,先重启主节点尝试恢复,中间的间隔从检测到应用重启大概在一两分钟以内。当然也要在检测服务中去设定最大时长,如果在重启了N次后超过max time应该去主动切换到备节点。
- 2 -主备都需要上报自己和对方节点的健康状态,当发生异常时应该告警通知到系统管理员

2 告警是否会重复?
-> prometheus代码中的逻辑是用labels去标记一个告警的唯一性,它发生告警后会把信息发给alertmanager,由alertmanager去管理收到的所有告警,也是由它去通过告警标识(labels排序完做hash)去判断是否重复告警的,所以在主备prometheus数据一致的情况下不会发生重复告警

from open-monitor.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.