Giter VIP home page Giter VIP logo

Comments (49)

lhmouse avatar lhmouse commented on June 13, 2024

Thanks for your interest. 😂

Would you please provide some more detailed information (backtrace of each thread for example, which can be obtained using a debugger)?
(I have been keeping an eye on LLVM for a couple of years. Recently they seem to have dropped MinGW support because no documentation about MinGW can be found.)

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

I posted a comment on SF. The first problem didn't seem reproducible.

from mcfgthread.

awson avatar awson commented on June 13, 2024

The first problem manifests itself on vanilla MSys2 gcc only.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

Fair enough. I am trying to build LLD in MSYS now.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

How did you overcome these errors?

CMake Error: File /e/Desktop/llvm-project/llvm/E:/Desktop/llvm-project/llvm/lib/Target/PowerPC/MCTargetDesc/LLVMBuild.txt does not exist.
CMake Error at /e/Desktop/build/LLVMBuild.cmake:242 (configure_file):
(... hundreds of errors abridged ...)

from mcfgthread.

awson avatar awson commented on June 13, 2024

I never built all targets, usually I build X86 and NVPTX only, but to debug LLD on mingw, X86 only would be enough, thus simply add -DLLVM_TARGETS_TO_BUILD=X86 when doing cmake.

Something like this would be OK I believe:
cmake -DLLVM_ENABLE_THREADS=true -DCMAKE_BUILD_TYPE=Debug -DBUILD_SHARED_LIBS=ON -DLLVM_TARGETS_TO_BUILD=X86.
(I use shared libs build)

from mcfgthread.

awson avatar awson commented on June 13, 2024

Also paths like /e/Desktop/llvm-project/llvm/E:/Desktop/llvm-project... look very suspicious.
I never use anything other than plain vanilla cmd.exe shell to build LLVM.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

I followed the instruction here.

Something like this would be OK I believe:
cmake -DLLVM_ENABLE_THREADS=true -DCMAKE_BUILD_TYPE=Debug -DBUILD_SHARED_LIBS=ON -DLLVM_TARGETS_TO_BUILD=X86.
(I use shared libs build)

It didn't help. The same errors took place.

from mcfgthread.

awson avatar awson commented on June 13, 2024

I suspect the culprit is wrong paths story under MSys shell.
I never had the problems you mentioned when using vanilla command-line.

Usually I build things with simple batch files with (roughly) the following structure:

@echo off
set CXX=g++
set CC=gcc
set PATH=<PATH_TO_GCC_BIN_DIR>;%PATH%
cmake -DLLVM_ENABLE_THREADS=true -DCMAKE_BUILD_TYPE=Debug -DBUILD_SHARED_LIBS=ON -DLLVM_INCLUDE_TESTS=OFF -DLLVM_TARGETS_TO_BUILD=X86 -G Ninja LLVM_SRC_PATH
ninja lld

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

The path problem still exists. But this time it is the Unix-y path that is causing the problem:
(Note that I was using the generator for Unix makefile while you were using the generator for Ninja)

CreateProcess failed: The system cannot find the file specified.

full-log.txt

from mcfgthread.

awson avatar awson commented on June 13, 2024

Did you ever tried to use vanilla command prompt (cmd.exe) instead of unixish one?

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

Yes, neither worked.

Since I am using the MSYS2 cmake, paths are converted to Unix-y ones, so cmake sees the same path in either case:

The C compiler "/mingw32/bin/gcc.exe" is not able to compile a simple test
program.

from mcfgthread.

awson avatar awson commented on June 13, 2024

Ah, I use native cmake (mingw or precompiled package from cmake.org), absolutely no problems with it.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

There is another error now:

E:\Desktop\build>ninja lld
ninja: error: unknown target 'lld', did you mean 'llc'?

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

... You just missed -DLLVM_ENABLE_PROJECTS=lld. I added that option and it builds now.

from mcfgthread.

awson avatar awson commented on June 13, 2024

Btw, lld on windows doesn't support windows-gnu target linking, thus to reproduce the problem one should either have xxx-windows-msvc object code (that produced by Visual C/C++ compiler or clang with corresponding target) to link. Do you have Visual C/C++ installed?

Perhaps, it would be possible to test lld against elf code linking, but this way one should have elf code to link. I never tried the latter, though.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

Yes I have reproduced the problem:

E:\Desktop>cat main.c
#include <stdio.h>

extern void bark();

int main(){
        printf("inside %s\n", __func__);
        bark();
}

E:\Desktop>cat  test.c
#include <stdio.h>

void bark(){
        printf("inside %s\n", __func__);
}

E:\Desktop>cl /nologo /c main.c test.c
main.c
test.c
正在生成代码...

E:\Desktop>build\bin\lld-link.exe main.obj test.obj

Then it hangs.

from mcfgthread.

awson avatar awson commented on June 13, 2024

Btw, ExitProcess also deadlocks, only TerminateProcess doesn't.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

Btw, ExitProcess also deadlocks, only TerminateProcess doesn't.

That is because ExitProcess() sends all DLLs loaded a DLL_PROCESS_DETACH notification, and the thread that called ExitProcess() hangs up on `NtReleaseKeyedEvent()·.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

I attached LLD with a modified version of mcfgthread that would trigger a breakpoint if NtReleaseKeyedEvent() times out in 5 seconds.

Note that inside a DLL, all destructors of static objects are called by DllMain() with the dwReason parameter set to 0 (DLL_PROCESS_DETACH). Here the destructor ThreadPoolExecutor::~ThreadPoolExecutor() called std::condition_variable::notify_all(), which called __MCFCRT_gthread_cond_broadcast(), which thought there were other threads sleeping on the keyed event (the condition variable itself) and called NtReleaseKeyedEvent() and deadlocked.

There should indeed be threads sleeping on the condition variable before exit(), which I believe is exactly why the LLVM people call std::condition_variable::notify_all() inside the destructor. But remember, inside the DllMain() function, all other threads will have been terminated. This is Windows-specific behavior.

Full backtrace:

Thread 1 received signal SIGTRAP, Trace/breakpoint trap.
[Switching to Thread 13924.0x3308]
0x768f338e in KERNELBASE!DebugBreak () from C:\Windows\syswow64\KernelBase.dll
(gdb) bt
#0  0x768f338e in KERNELBASE!DebugBreak () from C:\Windows\syswow64\KernelBase.dll
#1  0x661023fa in mcfgthread-10!_MCFCRT_Bail () from E:\Desktop\build\bin\mcfgthread-10.dll
#2  0x661071fc in mcfgthread-10!__MCFCRT_OnAssertionFailure () from E:\Desktop\build\bin\mcfgthread-10.dll
#3  0x66103373 in mcfgthread-10!_MCFCRT_BroadcastConditionVariable () from E:\Desktop\build\bin\mcfgthread-10.dll
#4  0x6fee5e2b in  	 (__cond=<optimized out>) at C:/MinGW/MSYS2/mingw32/include/mcfgthread/env/_gthread_inl.h:177
#5  std::condition_variable::notify_all (this=<optimized out>) at ../../../../../gcc/libstdc++-v3/src/c++11/condition_variable.cc:73
#6  0x70e7e793 in lld::internal::ThreadPoolExecutor::~ThreadPoolExecutor (this=0x70ecb640 <lld::internal::getDefaultExecutor()::exec>, __in_chrg=<optimized out>)
    at E:/Desktop/llvm-project/lld/include/lld/Core/Parallel.h:137
#7  0x70e5aed8 in __tcf_0 () at E:/Desktop/llvm-project/lld/include/lld/Core/Parallel.h:173
#8  0x70e411a4 in _CRT_INIT@12 (hDllHandle=hDllHandle@entry=0x70e40000, dwReason=dwReason@entry=0, lpreserved=lpreserved@entry=0x1)
    at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtdll.c:144
#9  0x70e412b4 in __DllMainCRTStartup (hDllHandle=hDllHandle@entry=0x70e40000, dwReason=0, lpreserved=lpreserved@entry=0x1)
    at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtdll.c:211
#10 0x70e413e7 in DllMainCRTStartup@12 (hDllHandle=0x70e40000, dwReason=0, lpreserved=0x1) at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtdll.c:171
#11 0x77969364 in ntdll!RtlQueryEnvironmentVariable () from C:\Windows\SysWOW64\ntdll.dll
#12 0x70e40000 in ?? ()
#13 0x77988f68 in ntdll!LdrShutdownProcess () from C:\Windows\SysWOW64\ntdll.dll
#14 0x70e413c4 in __DllMainCRTStartup (hDllHandle=0x148ed18, dwReason=2382904, lpreserved=0x1) at E:/GitHub/MINGW-packages/mingw-w64-crt-git/src/mingw-w64/mingw-w64-crt/crt/crtdll.c:201
#15 0x00000000 in ?? ()
(gdb)

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

Briefly, the following happened:

  1. A thread slept on a condition variable. The 'threads_trapped' counter of the condition variable was incremented.
  2. Another thread called ExitProcess() [Update: I incorrectly referred ExitThread() here], terminating the previous thread while it was sleeping.
  3. The second thread broadcast the condition variable. It read and cleared the 'threads_trapped' counter. Because the counter was non-zero, it called NtReleaseKeyedEvent() to wake up those threads sleeping. Note that NtReleaseKeyedEvent() blocks the caller until a thread is woken up. Since there were no threads to wake up, it effectively resulted in a deadlock.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

Possible workaround: If there are no other threads, don't call NtReleaseKeyedEvent() in ReallySignalConditionVariable(), ReallySignalMutex() or RealSetAndSignalOnceFlag().

... But how can we tell there are no other threads?

from mcfgthread.

awson avatar awson commented on June 13, 2024

Another thread called ExitThread(), terminating the previous thread while it was sleeping

Don't quite understand this. Why the "previous" thread is terminated? Because it was created by "another" thread or what?

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

See text following 'When handling DLL_PROCESS_DETACH' in https://msdn.microsoft.com/en-us/library/windows/desktop/ms682583(v=vs.85).aspx .

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

Oops it was a typo. It was ExitProcess() that the second thread called. I fixed up the comment.

from mcfgthread.

awson avatar awson commented on June 13, 2024
  1. This is a problem only when threads are brutally killed by ExitProcess, otherwise we won't get in this situation;
  2. When the process exits anyway, we don't need to bother with releasing resources explicitly;

The let's add a flag indicating that we are just about to exit and add an exit handler (using atexit) with sets this flag on. Now we may call NtReleaseKeyedEvent only if the flag is off.

This won't prevent the deadlock from using ExitProcess directly (which is non-preventable in general AFAIUI), but at least should cure exit or return from main.

Thoughts?

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

Not a solution.

  1. The mingw-w64 project has their own atexit() implementation, which is module(exe + dll)-wide and not process-wide. Callbacks of a DLL are also called as a result of FreeLibrary().
  2. Handlers registered by atexit() are called in FILO order. We have to register one after the ctor of the thread pool returns, which isn't practically possible without modifying LLVM code.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

The FUTEX API does not have such bugs AFAICS but they ain't available on Windows 7. Pity.

Anyway, I personally think it is undefined behavior to call std::exit() while more than one thread exist and it is a bug in LLD code. LLD should call std::quick_exit() instead.

from mcfgthread.

awson avatar awson commented on June 13, 2024

This isn't a problem of std::exit(). I've looked into why they introduced it in the first place. It turned out they merely shaved off 2s by not running destructors when linking huge chromium exe.

The program deadlocks also when returns normally. Try to comment exit out. This won't help.

from mcfgthread.

awson avatar awson commented on June 13, 2024

I've looked into mingw-w64 code. Indeed, they introduced their own atexit, but interestingly, they don't call ExitProcess directly (as Microsoft does), but call standard exit (from msvcrt), thus using msvcrt's atexit it's possible to set exit handlers for it (didn't think about how to accomplish this, though).

Also, I don't quite understand why we have to register one after the ctor of the thread pool returns.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

According to the C++ standard calling std::exit() is equivalent to returning from main() except that automatic objects are not destroyed. In C both are totally equivalent.

I suggest you ask LLD people for their opinion about it. If they are willing to fix their code I will close this issue as WONTFIX. Meanwhile, I am also searching for solutions.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

Also, I don't quite understand why we have to register one after the ctor of the thread pool returns.

It is simple: If we register the magic callback before the ctor, it will only get called after the dtor that is causing the deadlock (virtually not called at all).

from mcfgthread.

awson avatar awson commented on June 13, 2024

Removing exit doesn't help. LLD deadlocks. Otherwise I wouldn't even bother to open this ticket.

From your explanations I've understood that the deadlock can happen only if sleeping threads are killed by ExitProcess. But all exit handlers are run before ExitProcess call, hence NtReleaseKeyedEvent won't deadlock.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

But all exit handlers are run before ExitProcess call, hence NtReleaseKeyedEvent won't deadlock.

Please double check this for sure. In the case of DLL such handlers are run from the entry point function (__DllMainCRTStartup() above) which is a result of ExitProcess().

from mcfgthread.

awson avatar awson commented on June 13, 2024

I even tried to put my own exit handler into an LLD code, which simply printed something to the console, and it completed successfully before the deadlock. Thus we have no problems with either mingw-w64 or FILO order.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

Actually GCC uses atexit() to register dtors of static objects. If you register a callback after the pool's ctor returns, your callback is called before its dtor. If you register a callback before its ctor returns (including inside its ctor body), your callback is called after its dtor.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

Again, it is not possible to register a callback that can be called before the pool's dtor without modifying LLD' source, as it requires registering it after the pool's ctor (for example, inside the main() function.)

from mcfgthread.

awson avatar awson commented on June 13, 2024

I am not ready to investigate this right now, but this looks extremely strange.

How could atexit semantics depends on the place where it is called? Only handlers execution order depends on it but all of them should be guaranteed to be executed before the exit proper.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

How could atexit semantics depends on the place where it is called?

The C standard doesn't care about dynamic libraries, but it is true on almost every platform where there are dynamic libraries. The reason is simple, too: If you load a dynamic library which registers some exit callbacks and then unload it, those callbacks must be run and removed before the virtual memory of the dynamic library is unmapped. If those callbacks were deferred to the time when you realy call atexit() (directly or by returning from main()), the dynamic library might have been unmapped and you would likely get a segment fault.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

Please test the new branch. If this is OK I will port it to both branches.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

Thank goodness I found the function RtlDllShutdownInProgress() and it saved us!
It was just an accident, wasn't it? 😂

from mcfgthread.

awson avatar awson commented on June 13, 2024

Looks like it does the trick for me! Thanks!

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

Merged to master and cherry-picked to release-1.x then.

from mcfgthread.

wwc7654321 avatar wwc7654321 commented on June 13, 2024

Thanks for sharing you solution. @lhmouse
Is this page described the same problem?
https://support.microsoft.com/zh-cn/help/2582203/a-process-that-is-being-terminated-stops-responding-in-windows-7-or-in

I think it's fixed as a bug, but I can't expect every PC is patched.

from mcfgthread.

wwc7654321 avatar wwc7654321 commented on June 13, 2024

And is there any page describes "RtlDllShutdownInProgress" ? It's hard to find discussion about this function

from mcfgthread.

MaskRay avatar MaskRay commented on June 13, 2024

Since I was just reading lld code, if you want to learn more about lld+exit https://reviews.llvm.org/D58246

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

@wwc7654321 No. If a thread calls exit() or ExitProcess() without waiting for other threads to terminate then it must not call any synchronization function, nor should it perform memory allocation or deallocation, as the process state is not predictable.

@MaskRay std::thread::detach() is nonsense for a hosted program. Don't use it, period.

from mcfgthread.

nga888 avatar nga888 commented on June 13, 2024

If you want to see yet more related to LLD exit (using MSVC static run-time), check out my experimental patch here: https://reviews.llvm.org/D40366. I agree that detached threads are not good.

from mcfgthread.

lhmouse avatar lhmouse commented on June 13, 2024

Specifically on Windows, if threads are detached then it is not safe to call std::exit() or std::quick_exit(), or to return from main() which has the same effect as calling std::exit(). The program must call std::_Exit().

from mcfgthread.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.