Comments (10)
After blissfully ignoring this issue for a couple of weeks I ran into it again in a major way a couple of days ago. I decided to take another look and I think I have a solid lead now. It looks like it might actually be two distinct issues (albeit very closely related), with one being surfaced through pruning on hipSYCL's side, and the other being a pure Clang/LLVM bug. Needs a bit more investigation, I'll check back next week!
from adaptivecpp.
The kokkos guys may have a similar (same?) problem: kokkos/kokkos#1547
from adaptivecpp.
Here it's suggested that such errors can be caused by using data after it has left scope: kokkos/kokkos#1173
from adaptivecpp.
I have some news on this: We discovered that the bug appears to have been fixed in the current Clang trunk (i.e., Clang 9), however an assertion is still thrown in debug builds. I've also narrowed the fix down to a particular commit, and created an issue about it in the LLVM bug tracker: https://bugs.llvm.org/show_bug.cgi?id=41597.
from adaptivecpp.
Excellent, thank you!
from adaptivecpp.
Unfortunately I have since encountered this issue again, using Clang 9. This means the root cause really hasn't been fixed, only the circumstances triggering the bug are different. I also fear we'll have to dig into Clang ourselves if we want to get this fixed anytime soon...
from adaptivecpp.
Okay, let's try figuring this out on our own :) We know that the generated code is identical for both working/non-working versions, with the only difference being the mangled name of the kernel, right? I would propose that we first try to verify if the issue is on the host side as generated by clang:
- We know it compiles with nvcc, but nvcc likely uses a different mangled kernel name. Let's verify if the nvcc kernel name is indeed different...
- ... if this is indeed the case, let's see what happens if we launch the clang-compiled ptx kernel without clang: We can launch the PTX code directly using the CUDA driver API, based on the kernel name. We need to be careful about kernel parameters (in SYCL, captured accessors), so let's see if we can reproduce the behavior using a kernel that doesn't capture anything (e.g. just calls
printf
) and use that for testing. - If we cannot reproduce the issue with a non-capturing kernel, it may be an issue with clang's implementation of lambda captures or kernel parameters.
- Otherwise, if the issue also appears when launching the kernel directly with the driver API, it can either be
- an issue with the generated PTX (which would be weird, because we know that the working version is the same except for the kernel name)
- It may be a bug in CUDA - perhaps it just has a problem with certain mangled kernel names, which may only be triggered when compiling with clang
- If the issue doesn't appear
- It is likely a problem on the host side, related to how clang invokes the kernel
- Since we use a kernel that doesn't capture anything it cannot be related to kernel parameters/captures
from adaptivecpp.
I've got a minimal pure CUDA test case and preliminary fix in place, see https://reviews.llvm.org/D64015. If this gets merged it'll also require a change in the Clang plugin (i.e., use getSharedMangleContext
), but I'll make a PR once that happens!
from adaptivecpp.
Wow, great news! Thank you!
from adaptivecpp.
Kernel name mangling issues are well known by now and addressed in hipSYCL in various ways, depending on clang version.
from adaptivecpp.
Related Issues (20)
- Question about performance using "generic" HOT 1
- cmake linking on 64-bit systems looks for libOpenCL.so in 32-bit directory HOT 3
- dump_test fails assertion and crashes
- omp.library-only with Cray OpenMP HOT 5
- Porting a DFT library HOT 1
- Compilation Error with C++20 and CUDA Target (Windows) HOT 9
- [CUDA] std::bad_alloc on system with multiple GPUs HOT 3
- Linux distro packaging HOT 5
- Issue with std::filesystem and GCC 8 HOT 2
- CMake trouble with release 23.10.0 HOT 1
- Performing multiple reductions HOT 2
- StdPar causes compiler segfault with `-fsanitize=undefined` HOT 1
- SYCL link error with `-fsanitize=undefined` when targeting offload backends HOT 1
- find no HIP devices while excuate acpp-info on Ubuntu22.04 with rocm-5.7.1 and AMD Readon 7900xtx device HOT 5
- getting error in cmake HOT 5
- Consider support for specialization constant and kernel_bundle HOT 3
- Debug build DLL hangs on LoadLibrary (Windows) HOT 16
- Undefined reference error HOT 10
- Insufficient c++ standard c++11 when using acpp as a non-SYCL compiler HOT 1
- [AMDGPU] Execution freezes on AMD MI100 with XNACK enabled HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from adaptivecpp.