a simple modification to the example in the README does not work: <div class="snip

<a class="issue-link js-issue-link" data-error-text="Failed to load title" data-id="16

inputing non-isbits types,about juliagpu/metal.jl

Comments (7)

maleadt commented on May 25, 2024 1

#135 should make vadd1 possible.

from metal.jl.

maleadt commented on May 25, 2024

You're actually passing a type, while you should be specializing on it instead (kernel(::Type{T}) where T).

from metal.jl.

bjarthur commented on May 25, 2024

thanks for the quick reply. that gives the same error though.

i think the fundamental problem is that type conversion is not allowed in a kernel, as the modified function below, which does NOT input a type nor specialize, but rather calls Float32 directly, also gives the same error.

julia> using Metal

julia> function vaddT(a, b, c)
           i = thread_position_in_grid_1d()
           c[i] = a[i] + Float32(b[i])
           return
       end
vaddT (generic function with 1 method)

julia> a = MtlArray([1,1,1,1]); b = MtlArray([2,2,2,2]); c = similar(a);

julia> @metal threads=2 groups=2 vaddT(a, b, c)
ERROR: InvalidIRError: compiling kernel #vaddT(MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to gpu_malloc)
Stacktrace:
 [1] malloc
   @ ~/.julia/packages/GPUCompiler/S3TWf/src/runtime.jl:89
 [2] macro expansion
   @ ~/.julia/packages/GPUCompiler/S3TWf/src/runtime.jl:184
 [3] macro expansion
   @ ./none:0
 [4] box
   @ ./none:0
 [5] box_float32
   @ ~/.julia/packages/GPUCompiler/S3TWf/src/runtime.jl:213
 [6] Int64
   @ ./float.jl:788
 [7] convert
   @ ./number.jl:7
 [8] setindex!
   @ ~/.julia/dev/Metal/src/device/array.jl:105
 [9] vaddT
   @ ./REPL[2]:3
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams, GPUCompiler.FunctionSpec{typeof(vaddT), Tuple{MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/validation.jl:141
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:418 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:417 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob, method_instance::Any; libraries::Bool, deferred_codegen::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool, ctx::LLVM.Context)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/utils.jl:83
  [6] mtlfunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:166
  [7] #40
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:161 [inlined]
  [8] JuliaContext(f::Metal.var"#40#41"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams, GPUCompiler.FunctionSpec{typeof(vaddT), Tuple{MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:76
  [9] mtlfunction_compile(job::GPUCompiler.CompilerJob)
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:160
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(Metal.mtlfunction_compile), linker::typeof(Metal.mtlfunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/cache.jl:90
 [11] mtlfunction(f::typeof(vaddT), tt::Type{Tuple{MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:148
 [12] mtlfunction(f::typeof(vaddT), tt::Type{Tuple{MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}})
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:141
 [13] top-level scope
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:64
 [14] top-level scope
    @ ~/.julia/dev/Metal/src/initialization.jl:33

from metal.jl.

bjarthur commented on May 25, 2024

more debugging data, which might or might not be related / informative: outside of the kernel, Float32 works, but sin by itself does not:

julia> Float32.(b)
4-element MtlVector{Float32}:
 2.0
 2.0
 2.0
 2.0

julia> sin.(Array(b))
4-element Vector{Float64}:
 0.9092974268256817
 0.9092974268256817
 0.9092974268256817
 0.9092974268256817

julia> sin.(Float32.(b))
4-element MtlVector{Float32}:
 0.9092974
 0.9092974
 0.9092974
 0.9092974

julia> sin.(b)
ERROR: InvalidIRError: compiling kernel #broadcast_kernel#28(Metal.mtlKernelContext, MtlDeviceVector{Float64, 1}, Base.Broadcast.Broadcasted{Metal.MtlArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(sin), Tuple{Base.Broadcast.Extruded{MtlDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported use of double floating-point value
Stacktrace:
 [1] reinterpret
   @ ./essentials.jl:438
 [2] paynehanek
   @ ./special/rem_pio2.jl:139
 [3] rem_pio2_kernel
   @ ./special/rem_pio2.jl:282
 [4] sin
   @ ./special/trig.jl:41
 [5] sin
   @ ./math.jl:1372
 [6] _broadcast_getindex_evalf
   @ ./broadcast.jl:670
 [7] _broadcast_getindex
   @ ./broadcast.jl:643
 [8] getindex
   @ ./broadcast.jl:597
 [9] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/6STCb/src/host/broadcast.jl:59

from metal.jl.

maleadt commented on May 25, 2024

You are mixing different bug reports in a single issue; please keep them separate (also for questions like this Discourse is better suited).

julia> sin.(b)
ERROR: InvalidIRError: compiling kernel #broadcast_kernel#28(Metal.mtlKernelContext, MtlDeviceVector{Float64, 1}, Base.Broadcast.Broadcasted{Metal.MtlArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(sin), Tuple{Base.Broadcast.Extruded{MtlDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported use of double floating-point value

Works for me:

julia> b = rand(Float32, 1)
1-element Vector{Float32}:
 0.9079935

julia> sin.(MtlArray(b))
1-element MtlVector{Float32}:
 0.78827065

ERROR: InvalidIRError: compiling kernel #vaddT(MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to gpu_malloc)

That's because Metal currently does not support exceptions, #69, and assigning a Float32 to an Int array (like you're doing in your kernel) can throw an InexactError.

from metal.jl.

bjarthur commented on May 25, 2024

thanks for the help, and sorry for the dumb questions.

to clarify your previous comment about specializing, that is not expected to work either i guess, right? at least i can't get it to:

julia> using Metal

julia> a = MtlArray([1,1,1,1f0]); b = MtlArray([2,2,2,2f0]); c = similar(a);

julia> function vadd1(a, b, c)  # specialize on T
           function kernel1(::Type{T}, a, b, c) where T
               i = thread_position_in_grid_1d()
               c[i] = a[i] + T(b[i])
               return
           end
           T = Float32
           @metal threads=2 groups=2 kernel1(T, a, b, c)
       end
vadd1 (generic function with 1 method)

julia> vadd1(a, b, c)
ERROR: AssertionError: isbits(arg)
Stacktrace:
 [1] (::Metal.var"#43#45"{MTLSize, MTLSize, Metal.HostKernel{var"#kernel1#1", Tuple{Type{Float32}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}, Tuple{DataType, MtlVector{Float32}, MtlVector{Float32}, MtlVector{Float32}}, Vector{MTLBuffer}})(cce::Metal.MTL.MTLComputeCommandEncoderInstance)
   @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:216
 [2] MTLComputeCommandEncoder(f::Metal.var"#43#45"{MTLSize, MTLSize, Metal.HostKernel{var"#kernel1#1", Tuple{Type{Float32}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}, Tuple{DataType, MtlVector{Float32}, MtlVector{Float32}, MtlVector{Float32}}, Vector{MTLBuffer}}, cmdbuf::Metal.MTL.MTLCommandBufferInstance; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Metal.MTL ~/.julia/dev/Metal/lib/mtl/command_enc/compute.jl:44
 [3] MTLComputeCommandEncoder(f::Function, cmdbuf::Metal.MTL.MTLCommandBufferInstance)
   @ Metal.MTL ~/.julia/dev/Metal/lib/mtl/command_enc/compute.jl:41
 [4] (::Metal.HostKernel{var"#kernel1#1", Tuple{Type{Float32}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}})(::Type, ::Vararg{Any}; groups::Int64, threads::Int64, queue::Metal.MTL.MTLCommandQueueInstance)
   @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:197
 [5] macro expansion
   @ ~/.julia/dev/Metal/src/compiler/execution.jl:79 [inlined]
 [6] vadd1(a::MtlVector{Float32}, b::MtlVector{Float32}, c::MtlVector{Float32})
   @ Main ./REPL[4]:8
 [7] top-level scope
   @ REPL[5]:1
 [8] top-level scope
   @ ~/.julia/dev/Metal/src/initialization.jl:33

so i tried sneaking T in via a closure. no joy:

julia> function vadd2(a, b, c)  # closure with T
           function kernel2(a, b, c)
               i = thread_position_in_grid_1d()
               c[i] = a[i] + T(b[i])
               return
           end 
           T = Float32
           @metal threads=2 groups=2 kernel2(a, b, c)
       end
vadd2 (generic function with 1 method)

julia> vadd2(a, b, c)
ERROR: GPU compilation of kernel2(MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}) in world 32452 failed
KernelError: passing and using non-bitstype argument

Argument 1 to your kernel function is of type var"#kernel2#2", which is not isbits:
  .T is of type Core.Box which is not isbits.
    .contents is of type Any which is not isbits.

Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/validation.jl:101
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:154 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:152 [inlined]
  [5] emit_julia(job::GPUCompiler.CompilerJob; validate::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/utils.jl:83
  [6] emit_julia
    @ ~/.julia/packages/GPUCompiler/anMCs/src/utils.jl:77 [inlined]
  [7] compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ Metal ~/.julia/dev/Metal/src/compiler/compilation.jl:59
  [8] #39
    @ ~/.julia/dev/Metal/src/compiler/compilation.jl:55 [inlined]
  [9] JuliaContext(f::Metal.var"#39#40"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:76
 [10] compile
    @ ~/.julia/dev/Metal/src/compiler/compilation.jl:54 [inlined]
 [11] actual_compilation(cache::Dict{UInt64, Any}, key::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, world::UInt64, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/cache.jl:184
 [12] cached_compilation(cache::Dict{UInt64, Any}, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/cache.jl:163
 [13] macro expansion
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:161 [inlined]
 [14] macro expansion
    @ ./lock.jl:223 [inlined]
 [15] mtlfunction(f::var"#kernel2#2", tt::Type{Tuple{MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:157
 [16] mtlfunction
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:155 [inlined]
 [17] macro expansion
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:77 [inlined]
 [18] vadd2(a::MtlVector{Float32}, b::MtlVector{Float32}, c::MtlVector{Float32})
    @ Main ./REPL[6]:8
 [19] top-level scope
    @ REPL[7]:1
 [20] top-level scope
    @ ~/.julia/dev/Metal/src/initialization.jl:33

only when T is defined inside does it work:

julia> function vadd3(a, b, c)
           function kernel3(a, b, c)
               T = Float32
               i = thread_position_in_grid_1d()
               c[i] = a[i] + T(b[i])
               return
           end
           @metal threads=2 groups=2 kernel3(a, b, c)
       end
vadd3 (generic function with 1 method)

julia> vadd3(a, b, c)
Metal.HostKernel{var"#kernel3#3", Tuple{MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}(var"#kernel3#3"(), Metal.MTL.MTLComputePipelineStateInstance (object of type AGXG14XFamilyComputePipeline))

i was surprised that setting T to a function of the type parameters in the outer function worked:

julia> function vadd4(a::MtlVector{Ta}, b::MtlVector{Tb}, c::MtlVector{Tc}) where {Ta,Tb,Tc}
           function kernel4(a, b, c)
               T = promote_type(Ta,Tb,Tc)
               i = thread_position_in_grid_1d()
               c[i] = a[i] + T(b[i])
               return
           end
           @metal threads=2 groups=2 kernel4(a, b, c)
       end
vadd4 (generic function with 1 method)

julia> vadd4(a, b, c)
Metal.HostKernel{var"#kernel4#4"{Float32, Float32, Float32}, Tuple{MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}(var"#kernel4#4"{Float32, Float32, Float32}(), Metal.MTL.MTLComputePipelineStateInstance (object of type AGXG14XFamilyComputePipeline))

but worry that doing so inside the kernel is slow. but i guess that is computed during compilation, not run-time?

my original CUDA function, which i'm trying to refactor to Metal (and eventually KernelAbstractions), computes T via the type parameters outside the kernel and works. like this:

julia> function vadd5(a::MtlVector{Ta}, b::MtlVector{Tb}, c::MtlVector{Tc}) where {Ta,Tb,Tc}
           function kernel5(a, b, c)
               i = thread_position_in_grid_1d()
               c[i] = a[i] + T(b[i])
               return
           end
           T = promote_type(Ta,Tb,Tc)
           @metal threads=2 groups=2 kernel5(a, b, c)
       end
vadd5 (generic function with 1 method)

julia> vadd5(a, b, c)  # KernelError: passing and using non-bitstype argument
ERROR: GPU compilation of kernel5(MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}) in world 32458 failed
KernelError: passing and using non-bitstype argument

Argument 1 to your kernel function is of type var"#kernel5#5", which is not isbits:
  .T is of type Core.Box which is not isbits.
    .contents is of type Any which is not isbits.

Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/validation.jl:101
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:154 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:152 [inlined]
  [5] emit_julia(job::GPUCompiler.CompilerJob; validate::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/utils.jl:83
  [6] emit_julia
    @ ~/.julia/packages/GPUCompiler/anMCs/src/utils.jl:77 [inlined]
  [7] compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ Metal ~/.julia/dev/Metal/src/compiler/compilation.jl:59
  [8] #39
    @ ~/.julia/dev/Metal/src/compiler/compilation.jl:55 [inlined]
  [9] JuliaContext(f::Metal.var"#39#40"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:76
 [10] compile
    @ ~/.julia/dev/Metal/src/compiler/compilation.jl:54 [inlined]
 [11] actual_compilation(cache::Dict{UInt64, Any}, key::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, world::UInt64, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/cache.jl:184
 [12] cached_compilation(cache::Dict{UInt64, Any}, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/cache.jl:163
 [13] macro expansion
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:161 [inlined]
 [14] macro expansion
    @ ./lock.jl:223 [inlined]
 [15] mtlfunction(f::var"#kernel5#5", tt::Type{Tuple{MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:157
 [16] mtlfunction
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:155 [inlined]
 [17] macro expansion
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:77 [inlined]
 [18] vadd5(a::MtlVector{Float32}, b::MtlVector{Float32}, c::MtlVector{Float32})
    @ Main ./REPL[12]:8
 [19] top-level scope
    @ REPL[13]:1
 [20] top-level scope
    @ ~/.julia/dev/Metal/src/initialization.jl:33

from metal.jl.

maleadt commented on May 25, 2024

The issue with vadd1 is a bug indeed, thanks for reiterating. I'll see about fixing it.

vadd2 doesn't work because T is a variable that's captured, whereas in vadd5 it's a typevar Julia can specialize your kernel on.

vadd3 works because you don't pass the non-isbits type T as an argument, but only have it as a constant in your kernel (again, allowing Julia to specialize on it).

As an alternative to vadd4 you can also get T from any of the kernel arguments:

function vadd(a, b, c)
    function kernel1(a::AbstractVector{T}, b, c) where T
        i = thread_position_in_grid_1d()
        c[i] = T(a[i]) + T(b[i])
        return
    end
    @metal threads=2 groups=2 kernel1(a, b, c)
end

The problems you're getting is just because you're trying to pass actual types, which aren't concrete objects, instead of typevars Julia can specialize on. There's a very big difference between them, even though the syntax looks similar.

from metal.jl.

inputing non-isbits types about metal.jl HOT 7 CLOSED

Comments (7)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent