Comments (3)
Julia MWE:
using Metal
function local_kernel(a, val::T) where T
i = thread_position_in_grid_1d()
b = MtlThreadGroupArray(T, 1)
Metal.atomic_store_explicit(pointer(b, i), val)
a[i] = b[i]
return
end
T = Int32
a = Metal.zeros(T, 1)
@metal local_kernel(a, T(42))
Looking at Console, it seems like the Metal compiler is crashing:
Crashed Thread: 2
Exception Type: EXC_BAD_ACCESS (SIGSEGV)
Exception Codes: KERN_INVALID_ADDRESS at 0x0000000000000018
Exception Codes: 0x0000000000000001, 0x0000000000000018
Termination Reason: Namespace SIGNAL, Code 11 Segmentation fault: 11
Terminating Process: exc handler [85705]
VM Region Info: 0x18 is not in any region. Bytes before following region: 4375478248
REGION TYPE START - END [ VSIZE] PRT/MAX SHRMOD REGION DETAIL
UNUSED SPACE AT START
--->
__TEXT 104cc8000-104ccc000 [ 16K] r-x/r-x SM=COW ...mpilerService
Thread 0:
0 libsystem_kernel.dylib 0x18c57cf88 __semwait_signal_nocancel + 8
1 libsystem_c.dylib 0x18c471c04 nanosleep$NOCANCEL + 216
2 libsystem_c.dylib 0x18c471ad0 sleep$NOCANCEL + 52
3 libdispatch.dylib 0x18c4177dc _dispatch_queue_cleanup2 + 212
4 libsystem_pthread.dylib 0x18c5b39fc _pthread_tsd_cleanup + 620
5 libsystem_pthread.dylib 0x18c5b6724 _pthread_exit + 84
6 libsystem_pthread.dylib 0x18c5b372c pthread_exit + 88
7 libdispatch.dylib 0x18c4132b0 dispatch_main + 128
8 libxpc.dylib 0x18c2e1ca4 _xpc_objc_main + 752
9 libxpc.dylib 0x18c2f0904 _xpc_main + 324
10 libxpc.dylib 0x18c2e180c xpc_main + 64
11 MTLCompilerService 0x104cc9cb4 main + 420
12 dyld 0x18c2350e0 start + 2360
Thread 1:
0 libsystem_kernel.dylib 0x18c57cfc0 __sigsuspend_nocancel + 8
1 libdispatch.dylib 0x18c4178dc _dispatch_sigsuspend + 48
2 libdispatch.dylib 0x18c4178ac _dispatch_sig_thread + 60
Thread 2 Crashed:
0 libLLVM.dylib 0x20a4a79bc llvm::TargetRegisterInfo::getRegSizeInBits(llvm::Register, llvm::MachineRegisterInfo const&) const + 192
1 libLLVM.dylib 0x209cb91c0 0x209a97000 + 2236864
2 libLLVM.dylib 0x209cb74b4 0x209a97000 + 2229428
3 libLLVM.dylib 0x20a33de40 llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 408
4 libLLVM.dylib 0x20a5caa98 llvm::FPPassManager::runOnFunction(llvm::Function&) + 668
5 libLLVM.dylib 0x209fc3b58 0x209a97000 + 5426008
6 libLLVM.dylib 0x20a5caf58 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 756
7 libLLVM.dylib 0x209f5e4b8 0x209a97000 + 5010616
8 libLLVM.dylib 0x209f5a318 0x209a97000 + 4993816
9 libLLVM.dylib 0x209f5fb2c llvm::AGX::AGXCompilePlan::execute(llvm::AGX::CompileRequest&) + 760
10 AGXCompilerCore 0x1b0936cb8 AGCLLVMCtx::compile(AGCLLVMObject*, llvm::Module&, AGCFastMathFlags, llvm::AGX::PipelineType, llvm::AGX::CodeGenOptions&, bool) + 1372
11 AGXCompilerCore 0x1b08c2a98 AGCLLVMUserObject::compile() + 21872
12 AGXCompilerCore 0x1b0803d00 AGCCodeGenServiceBuildRequestInternal(AGCCodeGenService*, void const*, unsigned long, void const*, unsigned long, llvm::Module*, AGCTargetArch, _AGCStreamToken&, void const**, unsigned long*, APIKind) + 6764
13 AGXCompilerCore 0x1b08021c4 MTLCompilerBuildRequestWithOptions + 124
14 MTLCompiler 0x2130b1ea4 MTLCompilerPluginInterface::compilerBuildRequest(bool, unsigned int, void const*, unsigned long, unsigned int, void const*, BackendCompilationOutput&) + 256
15 MTLCompiler 0x2130ad6fc MTLCompilerObject::backendCompileModule(BinaryRequestData&, BackendCompilationOutput&, unsigned long long, std::__1::vector<CompileTimeData, std::__1::allocator<CompileTimeData>>&) + 656
16 MTLCompiler 0x2130b21c8 MTLCompilerObject::backendCompileExecutableRequest(BinaryRequestData&) + 448
17 MTLCompiler 0x2130b483c MTLCompilerObject::buildRequest(unsigned int, unsigned int, void const*, unsigned long, void (unsigned int, void const*, unsigned long, char const*) block_pointer) + 592
18 MTLCompiler 0x2130beb84 split_stack_call_impl + 24
19 MTLCompiler 0x2130bec54 split_stack_call + 196
20 MTLCompiler 0x2130b2674 MTLCodeGenServiceBuildRequest + 328
21 MTLCompilerService 0x104cca13c invocation function for block in MTLCompilerServiceHandleEvent(NSObject<OS_xpc_object>*) + 1036
22 libxpc.dylib 0x18c2d7848 _xpc_connection_call_event_handler + 144
23 libxpc.dylib 0x18c2d61d8 _xpc_connection_mach_event + 1384
24 libdispatch.dylib 0x18c4049d0 _dispatch_client_callout4 + 20
25 libdispatch.dylib 0x18c420c5c _dispatch_mach_msg_invoke + 468
26 libdispatch.dylib 0x18c40bd28 _dispatch_lane_serial_drain + 368
27 libdispatch.dylib 0x18c421998 _dispatch_mach_invoke + 444
28 libdispatch.dylib 0x18c40bd28 _dispatch_lane_serial_drain + 368
29 libdispatch.dylib 0x18c40c9d4 _dispatch_lane_invoke + 380
30 libdispatch.dylib 0x18c41761c _dispatch_root_queue_drain_deferred_wlh + 288
31 libdispatch.dylib 0x18c416e90 _dispatch_workloop_worker_thread + 404
32 libsystem_pthread.dylib 0x18c5b2114 _pthread_wqthread + 288
33 libsystem_pthread.dylib 0x18c5b0e30 start_wqthread + 8
Thread 3:
0 libsystem_pthread.dylib 0x18c5b0e28 start_wqthread + 0
Thread 4:
0 libsystem_pthread.dylib 0x18c5b0e28 start_wqthread + 0
Thread 2 crashed with ARM Thread State (64-bit):
x0: 0x000000023e7bf290 x1: 0x0000000000000000 x2: 0x00000001468c4200 x3: 0x00000001468c4200
x4: 0x00006000031a6950 x5: 0x0000000000000008 x6: 0x0000002e00000048 x7: 0x000000000004e400
x8: 0x000000023a83c540 x9: 0x0000000000000072 x10: 0x0000000000000000 x11: 0x0000000000000000
x12: 0x0000000000000000 x13: 0x0000000000000000 x14: 0x000000023e7b5330 x15: 0xfffffffffffff000
x16: 0x00000002434d2930 x17: 0xc3310002434d2930 x18: 0x0000000000000000 x19: 0x0000000146887408
x20: 0x00000001468c4200 x21: 0x0000000146887408 x22: 0x0000000000000000 x23: 0x0000000000000000
x24: 0x00000001468c4200 x25: 0x0000000000000000 x26: 0x0000000000000001 x27: 0x0000000000000190
x28: 0x00000001078f1bd8 fp: 0x00000001078f16e0 lr: 0x9a21000209cb91c0
sp: 0x00000001078f16b0 pc: 0x000000020a4a79bc cpsr: 0xa0001000
far: 0x0000000000000018 esr: 0x92000006 (Data Abort) byte read Translation fault
Binary Images:
0x104cc8000 - 0x104ccbfff com.apple.MTLCompilerService (341.36) <74b60d6a-e03e-3b2a-a911-dc183eba3e13> /System/Library/Frameworks/Metal.framework/Versions/A/XPCServices/MTLCompilerService.xpc/Contents/MacOS/MTLCompilerService
0x18c574000 - 0x18c5aefff libsystem_kernel.dylib (*) <a7228b5d-53c7-3fe9-84e4-2a8c04dcf051> /usr/lib/system/libsystem_kernel.dylib
0x18c44b000 - 0x18c4c9ffb libsystem_c.dylib (*) <cc7a439e-f104-3047-995c-9a5a3cc4a442> /usr/lib/system/libsystem_c.dylib
0x18c401000 - 0x18c447fff libdispatch.dylib (*) <5aa1649c-ef1d-39f7-a66c-4c5d2e53c474> /usr/lib/system/libdispatch.dylib
0x18c5af000 - 0x18c5bbff3 libsystem_pthread.dylib (*) <449bbad3-f7ef-371d-9a59-fd4ffa78289b> /usr/lib/system/libsystem_pthread.dylib
0x18c2c9000 - 0x18c30ffff libxpc.dylib (*) <147ea529-cecc-34fb-8250-b2a3893d3c3e> /usr/lib/system/libxpc.dylib
0x18c22f000 - 0x18c2c3387 dyld (*) <50746901-db0e-39a0-b391-baaa6b82ad0f> /usr/lib/dyld
0x0 - 0xffffffffffffffff ??? (*) <00000000-0000-0000-0000-000000000000> ???
0x209a97000 - 0x20c85aff7 libLLVM.dylib (*) <04dd424e-dd70-3034-b1f8-ef9b275e1d8a> /System/Library/PrivateFrameworks/GPUCompiler.framework/Versions/32023/Libraries/libLLVM.dylib
0x1b07fe000 - 0x1b09e9fff com.apple.AGXCompilerCore (276.62) <9b9da650-3c5c-34f8-9934-d6d4ee14a40c> /System/Library/PrivateFrameworks/AGXCompilerCore.framework/Versions/A/AGXCompilerCore
0x21300b000 - 0x2130d5fff com.apple.MTLCompiler (341.36) <3ddc8ba3-71bd-3745-8443-0ca59c0b1c45> /System/Library/PrivateFrameworks/MTLCompiler.framework/Versions/32023/MTLCompiler
from metal.jl.
LLVM MWE:
target datalayout = "e-p:64:64:64-i1:8:8-i8:8:8-i16:16:16-i32:32:32-i64:64:64-f32:32:32-f64:64:64-v16:16:16-v24:32:32-v32:32:32-v48:64:64-v64:64:64-v96:128:128-v128:128:128-v192:256:256-v256:256:256-v512:512:512-v1024:1024:1024-n8:16:32"
target triple = "air64-apple-macosx14.3.1"
@threadgroup_memory = external addrspace(3) global [1 x i32]
declare void @air.atomic.local.store.i32(i32 addrspace(3)*, i32, i32, i32)
define void @my_kernel(i32 %thread_position_in_grid) {
conversion:
%0 = zext i32 %thread_position_in_grid to i64
%1 = getelementptr i8, i8* addrspacecast (i8 addrspace(3)* bitcast ([1 x i32] addrspace(3)* @threadgroup_memory to i8 addrspace(3)*) to i8*), i64 %0
%2 = bitcast i8* %1 to i32*
%3 = addrspacecast i32* %2 to i32 addrspace(3)*
call void @air.atomic.local.store.i32(i32 addrspace(3)* %3, i32 0, i32 0, i32 0)
ret void
}
!air.kernel = !{!0}
!air.version = !{!6}
!0 = !{void (i32)* @my_kernel, !1, !2}
!1 = !{}
!2 = !{!3}
!3 = !{i32 2, !"air.thread_position_in_grid", !"air.arg_type_name", !"uint"}
!6 = !{i32 3, i32 0, i32 0}
It seems like the back-end is having trouble looking through the bitcast->addrspacecast->gep->bitcast->addrspacecase of the threadgroup memory pointer, which admittedly is a bit much.
from metal.jl.
Turns out this was really caused by illegal addrspacecast
s during pointer arithmetic. Those mostly get optimized away (which is questionable by itself, but we lack an AIR TargetTransformInfo to make that illegal), but not always, as in this case.
from metal.jl.
Related Issues (20)
- Add Support for BFloat16 HOT 3
- LU factorization: add allowsingular keyword argument HOT 1
- Autorelease changes lead to use after free with errors
- Shader validator error with linear broadcast kernel HOT 3
- Support for Paravirtualized Graphics for Github Actions CI HOT 4
- Reductions don't work on Shared Arrays HOT 1
- Port the opportunistic synchronization from CUDA.jl HOT 1
- Register v1.1.0 HOT 4
- Tests sporadically timing out on 1.11 HOT 10
- ReshapedArray indexing broken because of Int128 operation HOT 13
- KernelAbstractions copyto! typo
- Segmentation Faults HOT 11
- Port `accmulate!` and `findall` from CUDA.jl HOT 8
- `MTL.append_copy!` silently ignores Metal documentation restriction HOT 3
- Tests failing with `GPUCompiler` v0.26.5 and `LLVM` v7.1 HOT 3
- downgrades LLVM HOT 2
- Missing public/exported docstrings HOT 1
- Audit exports/public symbols HOT 1
- Generalize `adapt` to allow specifying the storage mode but not the element type and/or number of dimensions HOT 3
- sqrt(::Complex) unsupported due to conversion exceptions HOT 3
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from metal.jl.