Giter VIP home page Giter VIP logo

Comments (17)

fcrozat avatar fcrozat commented on August 17, 2024 2

I've dropped the offending patch and updated bees to 0.10. Should be available on openSUSE Tumbleweed soon: https://build.opensuse.org/request/show/1100400

from bees.

Zygo avatar Zygo commented on August 17, 2024

Can you provide:

  • stack traces with debug symbols?
  • log output with a higher -v level, or no -v option at all?

Ideally the first one, but the second one might still provide a clue.

from bees.

mischaelschill avatar mischaelschill commented on August 17, 2024

After compiling from git master (the tag didn't compile due to issues in limits.cc), bees no longer crashes, but these show up every minute:
vmalloc.txt

from bees.

mischaelschill avatar mischaelschill commented on August 17, 2024

With come context:
vmalloc2.txt

from bees.

mischaelschill avatar mischaelschill commented on August 17, 2024

I re-installed the prebuilt package with debug information and it crashes again:
stacktrace2.txt

from bees.

mischaelschill avatar mischaelschill commented on August 17, 2024

Another one, after again deleting .beeshome:
stacktrace3.txt

from bees.

mischaelschill avatar mischaelschill commented on August 17, 2024

I will let it run with -c1 to see whether it is some concurrency issue.

from bees.

mischaelschill avatar mischaelschill commented on August 17, 2024

running with -c1 didn't help

from bees.

mischaelschill avatar mischaelschill commented on August 17, 2024

Looking at the stacktraces, I can find two suspicious threads:
suspicious.txt
Two threads in BeesFdCache doing something. This happens in every case, most of the times the top frame is _close, in all cases it is under BeesFdCache5clear

from bees.

Zygo avatar Zygo commented on August 17, 2024

What compiler version/libstdc++ versions are these?

open_root is normally called millions of times per machine-day, and clear_cache a few thousand, always from different threads. If there were a race or unprotected critical section between those functions, I'd expect it to be causing crashes every day for everyone since 2018.

Maybe there's some subtle rule about iterator invalidation (or a new bug in the same) in recent libstdc++?

from bees.

mischaelschill avatar mischaelschill commented on August 17, 2024

From what I can find in the debug info:
GCC: (SUSE Linux) 13.1.1 20230522
stdc++ is dynamically linked, from the package info:
Name : libstdc++6
Version : 13.1.1+git7364-1.2

from bees.

mischaelschill avatar mischaelschill commented on August 17, 2024

I found this patch in the source for the rpm package:
avoid-swap.patch
Could this be the cause? It seems to remove a lock.

from bees.

kakra avatar kakra commented on August 17, 2024

That patch looks fishy... Shouldn't there still be a lock around the clears? Previously, the comment mentions that they explicitly swap the list to the stack so concurrent threads won't have to wait on the clears due to potentially expensive destructors.

from bees.

Zygo avatar Zygo commented on August 17, 2024

Oh wow that patch is very wrong:

  • it blocks worker threads while clearing the cache
  • it runs the destructors in the opposite order, so the Value container is destroyed before iterators referring to the container are destroyed, with potentially undefined behavior
  • it removes the mutex which protects this structure, which normally gets a lot of concurrent updates, resulting in guaranteed undefined behavior.

Remove the patch and everything should work again.

I don't know if that patch was created in response to #255, but it obviously wasn't reviewed here, and this is the wrong solution.

The correct solution is to fix the compiler, since it's doing the data flow analysis wrong. It is correct that the pointer refers to a stack temporary, but the pointer is itself stored in a stack temporary, and both pointer and temporary are destroyed when the variables go out of scope as the function returns. So the compiler is setting a badness-flag and then failing to clear it (or failing to admit that it is unable to determine the correct value of the badness-flag).

It might be possible to reverse the order of the two swap lines to help the compiler understand what is happening:

    180         template <class Return, class... Arguments>
    181         void
    182         LRUCache<Return, Arguments...>::clear()
    183         {
    184                 // Move the map and list onto the stack, then destroy it after we've released the lo    184 ck
    185                 // so that we don't block other threads if the list's destructors are expensive
    186                 decltype(m_list) new_list;
    187                 decltype(m_map) new_map;
    188                 unique_lock<mutex> lock(m_mutex);
    190                 m_map.swap(new_map);                // <-- put map before list
    189                 m_list.swap(new_list);
    191                 lock.unlock();
    192         }

An acceptable workaround would be to add -Wno-error=dangling-pointer to the make flags for building on that particular GCC version. As I mentioned in #255, this doesn't seem to be consistent across all compilers calling themselves GCC 13.

from bees.

Zygo avatar Zygo commented on August 17, 2024

Another possible solution is to simply not copy the map at all:

decltype(m_list) new_list;
unique_lock<mutex> lock(m_mutex);
m_map.clear();
m_list.swap(new_list);
// not strictly needed because it's implied by definition order, but it's clearer to write it explicitly
lock.unlock();

The map has no role in destroying Value elements, so it can be destroyed without affecting the list. The map's destructors are all trivial since they are all POD types (in the instantiations bees uses, if libstdc++ is not insane), so while this does hold the lock longer than needed, it's not much longer than needed.

This would also avoid any side-effects from non-trivial destructors in std::list::iterator, if libstdc++ is foolish enough to add any, because it gets rid of all iterators in the map pointing to the list before getting rid of the list.

from bees.

mischaelschill avatar mischaelschill commented on August 17, 2024

I'll report this to the package maintainer. I was able to compile master locally with the current version of GCC without any issue, which is what I'm currently running. I wasn't able to compile the 0.9.3 tag, maybe there could be a release?

from bees.

biggestsonicfan avatar biggestsonicfan commented on August 17, 2024

Was planning to use bees on SUSE but will wait until I see this resolve itself. Watching.

from bees.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.