Comments (15)
@kissken added to first post
from moira.
Memory consumption when checking each trigger in debug mode or in metrics would greatly help in analyzing such problems.
from moira.
Updating to 2.9.0 revealed about 100 division-by-0 triggers that were removed.
When updating to 2.9.0, a command was executed that removed all metrics from Redis:
cli -remove-all-metrics
cli -cleanup-last-checks
cli -cleanup-metrics
cli -cleanup-retentions
cli -cleanup-tags
Triggers with warning errors:
target t2 declared as alone metrics target but do not have any metrics and saved state in last check
have also been removed.
Now the checker logs are clean, but the problem persists.
from moira.
Yes, the command in the cli will clean up the current garbage, the leak fix will come in the next release. We recommend adding --cleanup-metrics
to cli in the regular cronjob
from moira.
@ihard Привет, подскажи, пожалуйста, график с очередью - это в момент проблемы?
сможешь еще дополнить, пожалуйста, графиками?
aliasByNode(keepLastValue(nonNegativeDerivative(*.*.moira.*.checker.metricEventsHandle.count_ps), 1), 2)
keepLastValue(movingAverage(nonNegativeDerivative(*.*.moira.*.checker.loval.triggers.count_ps), '5min'), 1)
aliasByNode(*.*.moira.*.checker.metricEventsHandle.95-percentile, 2)
aliasByNode(*.*.moira.*.checker.local.triggers.95-percentile, 2)
from moira.
In pprof heap - problem function:
14676.30MB 96.79% 96.79% 14676.30MB 96.79% github.com/moira-alert/moira/metric_source.MakeEmptyMetricData (inline)
from moira.
@ihard Could you, please, added info are triggers tagged or flat?
and yours triggers have system metrics without aggregate by pods?
I'm just guessing, in this case when many pods down and many up in one trigger, the problem appears or nor?
from moira.
99% are flat triggers
tagged triggers all have fewer than 200 metrics
yes, there are some triggers with system metrics whose names are almost always constant
The problem is not related to the mass start or stop of the pods and appears continuously for 24 hours
It’s more likely that some trigger or triggers get into the checker, this causes a large memory consumption, then the process crashes and the trigger or triggers go to another node
from moira.
hello, could you tell us, please, how many metrics match for pattern, when open /patterns page and sort by desc value at field metrics?
from moira.
~ 100 000 at the time of analyzing the problems, now the number has grown to ~ 250,000
Also, right now the patterns page is not displayed, there is an error in the interface:
Load failed
in api log:
{"level":"info","module":"api","context":"http","http.method":"GET","http.uri":"http://127.0.0.1:7092/api/pattern","http.protocol":"HTTP/1.0","http.remote_addr":"10.225.88.101:50190","username":"anonymous","http.status":200,"http.content_length":377215140,"elapsed_time_ms":3478,"elapsed_time":"3.478224001s","time":"2023-10-09 22:46:16.994","message":"GET http://127.0.0.1:7092/api/pattern/ HTTP/1.0"}
from moira.
We removed a large pack of triggers for which no metrics were received, the problem stopped.
from moira.
In addition, I'd like to add that we found a potential issue that may be causing increased memory consumption - the moira-pattern-metrics
key is not being cleared, which leads to a bunch of unnecessary requests in redis
Temporary fix for the problem:
- SET DEL in triggers automatically clears this key for the trigger
- Manually deleting metrics or
delete all NODATA metrics
also clears the key for the trigger
from moira.
It's very similar to our problem, after deleting the metrics, everything goes back to normal for a while, but then the problem returns.
from moira.
Hi! We added the moira-pattern-metrics
key cleanup to the cli command --cleanup-metrics
, which will greatly reduce the load on the checker and the number of resources consumed, since the moira-pattern-metrics
key leak generated unnecessary database queries
from moira.
Will the Cli clean the keys in the current database, and the leak itself was previously fixed in some release?
from moira.
Related Issues (20)
- Add the ability to send notifications to telegram topics HOT 10
- Mapping user groups to teams
- Support setting custom aliases for Notification channels
- Replay previous trigger notification on new notification in Telegram
- Add running unit tests on Redis Cluster too
- Add cli support to clean up useless trigger events HOT 1
- После обновления на 2.7.1 в вебе перестало отображаться последнее значение HOT 2
- Performance degrade after update to 2.7.1
- Error when working as an anonymous user in versions 2.6.2, 2.7.1 HOT 1
- Поддержка метрик из Zabbix
- Refactor Sender interface, split SendEvents method HOT 1
- Refactor Senders initialization
- Replace deprecated method of prometheus
- Add support of multiple senders with type: mail HOT 3
- Dont work regex groups in target HOT 1
- Notify users when Moira-notifier is off
- Test issue HOT 1
- Error update DB 2.7 -> 2.8 HOT 3
- Moira Web\API change Timezone for Trigger
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from moira.