Giter VIP home page Giter VIP logo

Comments (15)

ihard avatar ihard commented on June 18, 2024 1

@kissken added to first post

from moira.

ihard avatar ihard commented on June 18, 2024 1

Memory consumption when checking each trigger in debug mode or in metrics would greatly help in analyzing such problems.

from moira.

ihard avatar ihard commented on June 18, 2024 1

Updating to 2.9.0 revealed about 100 division-by-0 triggers that were removed.
When updating to 2.9.0, a command was executed that removed all metrics from Redis:

cli -remove-all-metrics
cli -cleanup-last-checks
cli -cleanup-metrics
cli -cleanup-retentions
cli -cleanup-tags

Triggers with warning errors:
target t2 declared as alone metrics target but do not have any metrics and saved state in last check
have also been removed.
Now the checker logs are clean, but the problem persists.

Page patterns:
2023-11-01_18-30-04-2

from moira.

almostinf avatar almostinf commented on June 18, 2024 1

Yes, the command in the cli will clean up the current garbage, the leak fix will come in the next release. We recommend adding --cleanup-metrics to cli in the regular cronjob

from moira.

kissken avatar kissken commented on June 18, 2024

@ihard Привет, подскажи, пожалуйста, график с очередью - это в момент проблемы?

сможешь еще дополнить, пожалуйста, графиками?

aliasByNode(keepLastValue(nonNegativeDerivative(*.*.moira.*.checker.metricEventsHandle.count_ps), 1), 2)

keepLastValue(movingAverage(nonNegativeDerivative(*.*.moira.*.checker.loval.triggers.count_ps), '5min'), 1)

aliasByNode(*.*.moira.*.checker.metricEventsHandle.95-percentile, 2)

aliasByNode(*.*.moira.*.checker.local.triggers.95-percentile, 2)

from moira.

ihard avatar ihard commented on June 18, 2024

In pprof heap - problem function:

14676.30MB 96.79% 96.79% 14676.30MB 96.79% github.com/moira-alert/moira/metric_source.MakeEmptyMetricData (inline)

from moira.

kissken avatar kissken commented on June 18, 2024

@ihard Could you, please, added info are triggers tagged or flat?
and yours triggers have system metrics without aggregate by pods?

I'm just guessing, in this case when many pods down and many up in one trigger, the problem appears or nor?

from moira.

ihard avatar ihard commented on June 18, 2024

99% are flat triggers
tagged triggers all have fewer than 200 metrics
yes, there are some triggers with system metrics whose names are almost always constant
The problem is not related to the mass start or stop of the pods and appears continuously for 24 hours
It’s more likely that some trigger or triggers get into the checker, this causes a large memory consumption, then the process crashes and the trigger or triggers go to another node

from moira.

kissken avatar kissken commented on June 18, 2024

hello, could you tell us, please, how many metrics match for pattern, when open /patterns page and sort by desc value at field metrics?

from moira.

ihard avatar ihard commented on June 18, 2024

~ 100 000 at the time of analyzing the problems, now the number has grown to ~ 250,000
Also, right now the patterns page is not displayed, there is an error in the interface:
Load failed
in api log:
{"level":"info","module":"api","context":"http","http.method":"GET","http.uri":"http://127.0.0.1:7092/api/pattern","http.protocol":"HTTP/1.0","http.remote_addr":"10.225.88.101:50190","username":"anonymous","http.status":200,"http.content_length":377215140,"elapsed_time_ms":3478,"elapsed_time":"3.478224001s","time":"2023-10-09 22:46:16.994","message":"GET http://127.0.0.1:7092/api/pattern/ HTTP/1.0"}

from moira.

ihard avatar ihard commented on June 18, 2024

We removed a large pack of triggers for which no metrics were received, the problem stopped.

from moira.

almostinf avatar almostinf commented on June 18, 2024

In addition, I'd like to add that we found a potential issue that may be causing increased memory consumption - the moira-pattern-metrics key is not being cleared, which leads to a bunch of unnecessary requests in redis

Temporary fix for the problem:

  1. SET DEL in triggers automatically clears this key for the trigger
  2. Manually deleting metrics or delete all NODATA metrics also clears the key for the trigger

from moira.

ihard avatar ihard commented on June 18, 2024

It's very similar to our problem, after deleting the metrics, everything goes back to normal for a while, but then the problem returns.

from moira.

almostinf avatar almostinf commented on June 18, 2024

Hi! We added the moira-pattern-metrics key cleanup to the cli command --cleanup-metrics, which will greatly reduce the load on the checker and the number of resources consumed, since the moira-pattern-metrics key leak generated unnecessary database queries

from moira.

ihard avatar ihard commented on June 18, 2024

Will the Cli clean the keys in the current database, and the leak itself was previously fixed in some release?

from moira.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.