Giter VIP home page Giter VIP logo

Comments (7)

eero-t avatar eero-t commented on June 11, 2024

@hnez Please look into this as its from #4026 PR.

@acmiyaguchi I'm still getting (valid looking) metrics from Prometheus write plugin, but its now logging complaints like this:

uc_update: Value too old: name = collectd_gpu_sysman_temperature_celsius{dev_file="card2",location="global-max",pci_bdf="0000:93:00.0",pci_dev="0x56c0"}; value time = 0.000; last cache update = 0.000;
uc_update: uc_update_metric failed: Error #-1; Additionally, strerror_r failed.

I compile/enable only one read + one write plugin, which may explain why it works for me.

For the write_http plugin crash, I would suggest running collectd under Valgrind memcheck tool (apt/dnf install valgrind & valgrind collectd <options>), see:

And for Prometheus write plugin missing data, I would suggest trying Valgrind threading error tools:

(I think the issue could relate to these plugins doing their own threading in addition to threading now done by collectd, and that adding races for accessing the metrics.)

from collectd.

hnez avatar hnez commented on June 11, 2024

Hi @eero-t,

I've asked my team lead for some time to look into the "on thread per writer"-fallout and got a slot later this week.
I hope I can fix the uc_ caching errors then an look into this issue.

from collectd.

hnez avatar hnez commented on June 11, 2024

Hi,

I've finally had time to look into these issues and you are right, both of them are on me. Oopsie.

write_http Segfault

The write_http plugin segfaulting was due to a wrong assumption on my side on who (plugin or daemon) is responsible for keeping the user_data_t around once it is passed to plugin_register_write.
I was under the assumption that it is the plugins responsibility to keep the user_data_t around and the daemon could just store a reference to it.

This is not the case.

The write_http plugin for example allocates the user_data_t on the stack, passes it to register_write and right afterwards the reference is no longer valid. The plugins I've tested for #4026 did not do it this way.

The behavior we observe before the segfaul is the spooky action at a distance from using stale references to a region on the stack.

The bug should be fixed by #4102.

Another write_http Segfault

I've also noticed that write_http segfaults on teardown due to a use-after-free caused by the user_data_t once again.

This should be fixed by #4104.

write_prometheus does not register metrics

This was caused by missing time and interval setup before calling uc_update that was also observed by @eero-t.

This should be fixed by #4103.

Results

With all three patches applied my test script shows all three plugins working.

It would be great if you could test the changes as well and comment here / in the respective PRs.

Best regards
Leonard

from collectd.

eero-t avatar eero-t commented on June 11, 2024

Verified that PR for "write_prometheus" plugin issue, fixed it (and PR looks otherwise OK).

I'm not using "write_http" plugin so somebody else needs to check those PRs, but it's interesting that also the plugin itself had unsafe assumption that needed to be fixed. So there may be other write plugins with similar assumptions, that got broken.

@hnez in #4026 you mention testing only write_throttle and "logfile" (write_log?) plugins. Maybe you could check also some other, simpler write plugin(s)?

$ ls src/write*.c
src/write_graphite.c      src/write_log.c         src/write_riemann.c            src/write_syslog.c
src/write_http.c          src/write_mongodb.c     src/write_riemann_threshold.c  src/write_tsdb.c
src/write_influxdb_udp.c  src/write_prometheus.c  src/write_sensu.c
src/write_kafka.c         src/write_redis.c       src/write_stackdriver.c

Note: rieman write plugin is buggy already in main branch, see #4050.

from collectd.

eero-t avatar eero-t commented on June 11, 2024

The other 2 fixes are merged, but this is still pending:

I've also noticed that write_http segfaults on teardown due to a use-after-free caused by the user_data_t once again. This should be fixed by #4104.

@mrunge?

from collectd.

octo avatar octo commented on June 11, 2024

#4104 and #4117 have been superseded by #4176

from collectd.

octo avatar octo commented on June 11, 2024

I think this has been fixed. Please re-open if you're still experiencing problems.

from collectd.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.