Giter VIP home page Giter VIP logo

Comments (10)

psalz avatar psalz commented on May 19, 2024 1

After blissfully ignoring this issue for a couple of weeks I ran into it again in a major way a couple of days ago. I decided to take another look and I think I have a solid lead now. It looks like it might actually be two distinct issues (albeit very closely related), with one being surfaced through pruning on hipSYCL's side, and the other being a pure Clang/LLVM bug. Needs a bit more investigation, I'll check back next week!

from adaptivecpp.

illuhad avatar illuhad commented on May 19, 2024

The kokkos guys may have a similar (same?) problem: kokkos/kokkos#1547

from adaptivecpp.

illuhad avatar illuhad commented on May 19, 2024

Here it's suggested that such errors can be caused by using data after it has left scope: kokkos/kokkos#1173

from adaptivecpp.

psalz avatar psalz commented on May 19, 2024

I have some news on this: We discovered that the bug appears to have been fixed in the current Clang trunk (i.e., Clang 9), however an assertion is still thrown in debug builds. I've also narrowed the fix down to a particular commit, and created an issue about it in the LLVM bug tracker: https://bugs.llvm.org/show_bug.cgi?id=41597.

from adaptivecpp.

illuhad avatar illuhad commented on May 19, 2024

Excellent, thank you!

from adaptivecpp.

psalz avatar psalz commented on May 19, 2024

Unfortunately I have since encountered this issue again, using Clang 9. This means the root cause really hasn't been fixed, only the circumstances triggering the bug are different. I also fear we'll have to dig into Clang ourselves if we want to get this fixed anytime soon...

from adaptivecpp.

illuhad avatar illuhad commented on May 19, 2024

Okay, let's try figuring this out on our own :) We know that the generated code is identical for both working/non-working versions, with the only difference being the mangled name of the kernel, right? I would propose that we first try to verify if the issue is on the host side as generated by clang:

  • We know it compiles with nvcc, but nvcc likely uses a different mangled kernel name. Let's verify if the nvcc kernel name is indeed different...
  • ... if this is indeed the case, let's see what happens if we launch the clang-compiled ptx kernel without clang: We can launch the PTX code directly using the CUDA driver API, based on the kernel name. We need to be careful about kernel parameters (in SYCL, captured accessors), so let's see if we can reproduce the behavior using a kernel that doesn't capture anything (e.g. just calls printf) and use that for testing.
  • If we cannot reproduce the issue with a non-capturing kernel, it may be an issue with clang's implementation of lambda captures or kernel parameters.
  • Otherwise, if the issue also appears when launching the kernel directly with the driver API, it can either be
    • an issue with the generated PTX (which would be weird, because we know that the working version is the same except for the kernel name)
    • It may be a bug in CUDA - perhaps it just has a problem with certain mangled kernel names, which may only be triggered when compiling with clang
  • If the issue doesn't appear
    • It is likely a problem on the host side, related to how clang invokes the kernel
    • Since we use a kernel that doesn't capture anything it cannot be related to kernel parameters/captures

from adaptivecpp.

psalz avatar psalz commented on May 19, 2024

I've got a minimal pure CUDA test case and preliminary fix in place, see https://reviews.llvm.org/D64015. If this gets merged it'll also require a change in the Clang plugin (i.e., use getSharedMangleContext), but I'll make a PR once that happens!

from adaptivecpp.

illuhad avatar illuhad commented on May 19, 2024

Wow, great news! Thank you!

from adaptivecpp.

illuhad avatar illuhad commented on May 19, 2024

Kernel name mangling issues are well known by now and addressed in hipSYCL in various ways, depending on clang version.

from adaptivecpp.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.