Giter VIP home page Giter VIP logo

Comments (7)

maleadt avatar maleadt commented on May 25, 2024 1

#135 should make vadd1 possible.

from metal.jl.

maleadt avatar maleadt commented on May 25, 2024

You're actually passing a type, while you should be specializing on it instead (kernel(::Type{T}) where T).

from metal.jl.

bjarthur avatar bjarthur commented on May 25, 2024

thanks for the quick reply. that gives the same error though.

i think the fundamental problem is that type conversion is not allowed in a kernel, as the modified function below, which does NOT input a type nor specialize, but rather calls Float32 directly, also gives the same error.

julia> using Metal

julia> function vaddT(a, b, c)
           i = thread_position_in_grid_1d()
           c[i] = a[i] + Float32(b[i])
           return
       end
vaddT (generic function with 1 method)

julia> a = MtlArray([1,1,1,1]); b = MtlArray([2,2,2,2]); c = similar(a);

julia> @metal threads=2 groups=2 vaddT(a, b, c)
ERROR: InvalidIRError: compiling kernel #vaddT(MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to gpu_malloc)
Stacktrace:
 [1] malloc
   @ ~/.julia/packages/GPUCompiler/S3TWf/src/runtime.jl:89
 [2] macro expansion
   @ ~/.julia/packages/GPUCompiler/S3TWf/src/runtime.jl:184
 [3] macro expansion
   @ ./none:0
 [4] box
   @ ./none:0
 [5] box_float32
   @ ~/.julia/packages/GPUCompiler/S3TWf/src/runtime.jl:213
 [6] Int64
   @ ./float.jl:788
 [7] convert
   @ ./number.jl:7
 [8] setindex!
   @ ~/.julia/dev/Metal/src/device/array.jl:105
 [9] vaddT
   @ ./REPL[2]:3
Hint: catch this exception as `err` and call `code_typed(err; interactive = true)` to introspect the erronous code with Cthulhu.jl
Stacktrace:
  [1] check_ir(job::GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams, GPUCompiler.FunctionSpec{typeof(vaddT), Tuple{MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}}}, args::LLVM.Module)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/validation.jl:141
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:418 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:417 [inlined]
  [5] emit_llvm(job::GPUCompiler.CompilerJob, method_instance::Any; libraries::Bool, deferred_codegen::Bool, optimize::Bool, cleanup::Bool, only_entry::Bool, validate::Bool, ctx::LLVM.Context)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/utils.jl:83
  [6] mtlfunction_compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:166
  [7] #40
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:161 [inlined]
  [8] JuliaContext(f::Metal.var"#40#41"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams, GPUCompiler.FunctionSpec{typeof(vaddT), Tuple{MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}}}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/driver.jl:76
  [9] mtlfunction_compile(job::GPUCompiler.CompilerJob)
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:160
 [10] cached_compilation(cache::Dict{UInt64, Any}, job::GPUCompiler.CompilerJob, compiler::typeof(Metal.mtlfunction_compile), linker::typeof(Metal.mtlfunction_link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/S3TWf/src/cache.jl:90
 [11] mtlfunction(f::typeof(vaddT), tt::Type{Tuple{MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:148
 [12] mtlfunction(f::typeof(vaddT), tt::Type{Tuple{MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}}})
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:141
 [13] top-level scope
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:64
 [14] top-level scope
    @ ~/.julia/dev/Metal/src/initialization.jl:33

from metal.jl.

bjarthur avatar bjarthur commented on May 25, 2024

more debugging data, which might or might not be related / informative: outside of the kernel, Float32 works, but sin by itself does not:

julia> Float32.(b)
4-element MtlVector{Float32}:
 2.0
 2.0
 2.0
 2.0

julia> sin.(Array(b))
4-element Vector{Float64}:
 0.9092974268256817
 0.9092974268256817
 0.9092974268256817
 0.9092974268256817

julia> sin.(Float32.(b))
4-element MtlVector{Float32}:
 0.9092974
 0.9092974
 0.9092974
 0.9092974

julia> sin.(b)
ERROR: InvalidIRError: compiling kernel #broadcast_kernel#28(Metal.mtlKernelContext, MtlDeviceVector{Float64, 1}, Base.Broadcast.Broadcasted{Metal.MtlArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(sin), Tuple{Base.Broadcast.Extruded{MtlDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported use of double floating-point value
Stacktrace:
 [1] reinterpret
   @ ./essentials.jl:438
 [2] paynehanek
   @ ./special/rem_pio2.jl:139
 [3] rem_pio2_kernel
   @ ./special/rem_pio2.jl:282
 [4] sin
   @ ./special/trig.jl:41
 [5] sin
   @ ./math.jl:1372
 [6] _broadcast_getindex_evalf
   @ ./broadcast.jl:670
 [7] _broadcast_getindex
   @ ./broadcast.jl:643
 [8] getindex
   @ ./broadcast.jl:597
 [9] broadcast_kernel
   @ ~/.julia/packages/GPUArrays/6STCb/src/host/broadcast.jl:59

from metal.jl.

maleadt avatar maleadt commented on May 25, 2024

You are mixing different bug reports in a single issue; please keep them separate (also for questions like this Discourse is better suited).


julia> sin.(b)
ERROR: InvalidIRError: compiling kernel #broadcast_kernel#28(Metal.mtlKernelContext, MtlDeviceVector{Float64, 1}, Base.Broadcast.Broadcasted{Metal.MtlArrayStyle{1}, Tuple{Base.OneTo{Int64}}, typeof(sin), Tuple{Base.Broadcast.Extruded{MtlDeviceVector{Int64, 1}, Tuple{Bool}, Tuple{Int64}}}}, Int64) resulted in invalid LLVM IR
Reason: unsupported use of double floating-point value

Works for me:

julia> b = rand(Float32, 1)
1-element Vector{Float32}:
 0.9079935

julia> sin.(MtlArray(b))
1-element MtlVector{Float32}:
 0.78827065

ERROR: InvalidIRError: compiling kernel #vaddT(MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}, MtlDeviceVector{Int64, 1}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to gpu_malloc)

That's because Metal currently does not support exceptions, #69, and assigning a Float32 to an Int array (like you're doing in your kernel) can throw an InexactError.

from metal.jl.

bjarthur avatar bjarthur commented on May 25, 2024

thanks for the help, and sorry for the dumb questions.

to clarify your previous comment about specializing, that is not expected to work either i guess, right? at least i can't get it to:

julia> using Metal

julia> a = MtlArray([1,1,1,1f0]); b = MtlArray([2,2,2,2f0]); c = similar(a);

julia> function vadd1(a, b, c)  # specialize on T
           function kernel1(::Type{T}, a, b, c) where T
               i = thread_position_in_grid_1d()
               c[i] = a[i] + T(b[i])
               return
           end
           T = Float32
           @metal threads=2 groups=2 kernel1(T, a, b, c)
       end
vadd1 (generic function with 1 method)

julia> vadd1(a, b, c)
ERROR: AssertionError: isbits(arg)
Stacktrace:
 [1] (::Metal.var"#43#45"{MTLSize, MTLSize, Metal.HostKernel{var"#kernel1#1", Tuple{Type{Float32}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}, Tuple{DataType, MtlVector{Float32}, MtlVector{Float32}, MtlVector{Float32}}, Vector{MTLBuffer}})(cce::Metal.MTL.MTLComputeCommandEncoderInstance)
   @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:216
 [2] MTLComputeCommandEncoder(f::Metal.var"#43#45"{MTLSize, MTLSize, Metal.HostKernel{var"#kernel1#1", Tuple{Type{Float32}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}, Tuple{DataType, MtlVector{Float32}, MtlVector{Float32}, MtlVector{Float32}}, Vector{MTLBuffer}}, cmdbuf::Metal.MTL.MTLCommandBufferInstance; kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
   @ Metal.MTL ~/.julia/dev/Metal/lib/mtl/command_enc/compute.jl:44
 [3] MTLComputeCommandEncoder(f::Function, cmdbuf::Metal.MTL.MTLCommandBufferInstance)
   @ Metal.MTL ~/.julia/dev/Metal/lib/mtl/command_enc/compute.jl:41
 [4] (::Metal.HostKernel{var"#kernel1#1", Tuple{Type{Float32}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}})(::Type, ::Vararg{Any}; groups::Int64, threads::Int64, queue::Metal.MTL.MTLCommandQueueInstance)
   @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:197
 [5] macro expansion
   @ ~/.julia/dev/Metal/src/compiler/execution.jl:79 [inlined]
 [6] vadd1(a::MtlVector{Float32}, b::MtlVector{Float32}, c::MtlVector{Float32})
   @ Main ./REPL[4]:8
 [7] top-level scope
   @ REPL[5]:1
 [8] top-level scope
   @ ~/.julia/dev/Metal/src/initialization.jl:33

so i tried sneaking T in via a closure. no joy:

julia> function vadd2(a, b, c)  # closure with T
           function kernel2(a, b, c)
               i = thread_position_in_grid_1d()
               c[i] = a[i] + T(b[i])
               return
           end 
           T = Float32
           @metal threads=2 groups=2 kernel2(a, b, c)
       end
vadd2 (generic function with 1 method)

julia> vadd2(a, b, c)
ERROR: GPU compilation of kernel2(MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}) in world 32452 failed
KernelError: passing and using non-bitstype argument

Argument 1 to your kernel function is of type var"#kernel2#2", which is not isbits:
  .T is of type Core.Box which is not isbits.
    .contents is of type Any which is not isbits.

Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/validation.jl:101
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:154 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:152 [inlined]
  [5] emit_julia(job::GPUCompiler.CompilerJob; validate::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/utils.jl:83
  [6] emit_julia
    @ ~/.julia/packages/GPUCompiler/anMCs/src/utils.jl:77 [inlined]
  [7] compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ Metal ~/.julia/dev/Metal/src/compiler/compilation.jl:59
  [8] #39
    @ ~/.julia/dev/Metal/src/compiler/compilation.jl:55 [inlined]
  [9] JuliaContext(f::Metal.var"#39#40"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:76
 [10] compile
    @ ~/.julia/dev/Metal/src/compiler/compilation.jl:54 [inlined]
 [11] actual_compilation(cache::Dict{UInt64, Any}, key::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, world::UInt64, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/cache.jl:184
 [12] cached_compilation(cache::Dict{UInt64, Any}, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/cache.jl:163
 [13] macro expansion
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:161 [inlined]
 [14] macro expansion
    @ ./lock.jl:223 [inlined]
 [15] mtlfunction(f::var"#kernel2#2", tt::Type{Tuple{MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:157
 [16] mtlfunction
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:155 [inlined]
 [17] macro expansion
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:77 [inlined]
 [18] vadd2(a::MtlVector{Float32}, b::MtlVector{Float32}, c::MtlVector{Float32})
    @ Main ./REPL[6]:8
 [19] top-level scope
    @ REPL[7]:1
 [20] top-level scope
    @ ~/.julia/dev/Metal/src/initialization.jl:33

only when T is defined inside does it work:

julia> function vadd3(a, b, c)
           function kernel3(a, b, c)
               T = Float32
               i = thread_position_in_grid_1d()
               c[i] = a[i] + T(b[i])
               return
           end
           @metal threads=2 groups=2 kernel3(a, b, c)
       end
vadd3 (generic function with 1 method)

julia> vadd3(a, b, c)
Metal.HostKernel{var"#kernel3#3", Tuple{MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}(var"#kernel3#3"(), Metal.MTL.MTLComputePipelineStateInstance (object of type AGXG14XFamilyComputePipeline))

i was surprised that setting T to a function of the type parameters in the outer function worked:

julia> function vadd4(a::MtlVector{Ta}, b::MtlVector{Tb}, c::MtlVector{Tc}) where {Ta,Tb,Tc}
           function kernel4(a, b, c)
               T = promote_type(Ta,Tb,Tc)
               i = thread_position_in_grid_1d()
               c[i] = a[i] + T(b[i])
               return
           end
           @metal threads=2 groups=2 kernel4(a, b, c)
       end
vadd4 (generic function with 1 method)

julia> vadd4(a, b, c)
Metal.HostKernel{var"#kernel4#4"{Float32, Float32, Float32}, Tuple{MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}(var"#kernel4#4"{Float32, Float32, Float32}(), Metal.MTL.MTLComputePipelineStateInstance (object of type AGXG14XFamilyComputePipeline))

but worry that doing so inside the kernel is slow. but i guess that is computed during compilation, not run-time?

my original CUDA function, which i'm trying to refactor to Metal (and eventually KernelAbstractions), computes T via the type parameters outside the kernel and works. like this:

julia> function vadd5(a::MtlVector{Ta}, b::MtlVector{Tb}, c::MtlVector{Tc}) where {Ta,Tb,Tc}
           function kernel5(a, b, c)
               i = thread_position_in_grid_1d()
               c[i] = a[i] + T(b[i])
               return
           end
           T = promote_type(Ta,Tb,Tc)
           @metal threads=2 groups=2 kernel5(a, b, c)
       end
vadd5 (generic function with 1 method)

julia> vadd5(a, b, c)  # KernelError: passing and using non-bitstype argument
ERROR: GPU compilation of kernel5(MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}) in world 32458 failed
KernelError: passing and using non-bitstype argument

Argument 1 to your kernel function is of type var"#kernel5#5", which is not isbits:
  .T is of type Core.Box which is not isbits.
    .contents is of type Any which is not isbits.

Stacktrace:
  [1] check_invocation(job::GPUCompiler.CompilerJob)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/validation.jl:101
  [2] macro expansion
    @ ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:154 [inlined]
  [3] macro expansion
    @ ~/.julia/packages/TimerOutputs/LHjFw/src/TimerOutput.jl:253 [inlined]
  [4] macro expansion
    @ ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:152 [inlined]
  [5] emit_julia(job::GPUCompiler.CompilerJob; validate::Bool)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/utils.jl:83
  [6] emit_julia
    @ ~/.julia/packages/GPUCompiler/anMCs/src/utils.jl:77 [inlined]
  [7] compile(job::GPUCompiler.CompilerJob, ctx::LLVM.Context)
    @ Metal ~/.julia/dev/Metal/src/compiler/compilation.jl:59
  [8] #39
    @ ~/.julia/dev/Metal/src/compiler/compilation.jl:55 [inlined]
  [9] JuliaContext(f::Metal.var"#39#40"{GPUCompiler.CompilerJob{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}})
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/driver.jl:76
 [10] compile
    @ ~/.julia/dev/Metal/src/compiler/compilation.jl:54 [inlined]
 [11] actual_compilation(cache::Dict{UInt64, Any}, key::UInt64, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, world::UInt64, compiler::typeof(Metal.compile), linker::typeof(Metal.link))
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/cache.jl:184
 [12] cached_compilation(cache::Dict{UInt64, Any}, cfg::GPUCompiler.CompilerConfig{GPUCompiler.MetalCompilerTarget, Metal.MetalCompilerParams}, ft::Type, tt::Type, compiler::Function, linker::Function)
    @ GPUCompiler ~/.julia/packages/GPUCompiler/anMCs/src/cache.jl:163
 [13] macro expansion
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:161 [inlined]
 [14] macro expansion
    @ ./lock.jl:223 [inlined]
 [15] mtlfunction(f::var"#kernel5#5", tt::Type{Tuple{MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}, MtlDeviceVector{Float32, 1}}}; name::Nothing, kwargs::Base.Pairs{Symbol, Union{}, Tuple{}, NamedTuple{(), Tuple{}}})
    @ Metal ~/.julia/dev/Metal/src/compiler/execution.jl:157
 [16] mtlfunction
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:155 [inlined]
 [17] macro expansion
    @ ~/.julia/dev/Metal/src/compiler/execution.jl:77 [inlined]
 [18] vadd5(a::MtlVector{Float32}, b::MtlVector{Float32}, c::MtlVector{Float32})
    @ Main ./REPL[12]:8
 [19] top-level scope
    @ REPL[13]:1
 [20] top-level scope
    @ ~/.julia/dev/Metal/src/initialization.jl:33

from metal.jl.

maleadt avatar maleadt commented on May 25, 2024

The issue with vadd1 is a bug indeed, thanks for reiterating. I'll see about fixing it.

vadd2 doesn't work because T is a variable that's captured, whereas in vadd5 it's a typevar Julia can specialize your kernel on.

vadd3 works because you don't pass the non-isbits type T as an argument, but only have it as a constant in your kernel (again, allowing Julia to specialize on it).

As an alternative to vadd4 you can also get T from any of the kernel arguments:

function vadd(a, b, c)
    function kernel1(a::AbstractVector{T}, b, c) where T
        i = thread_position_in_grid_1d()
        c[i] = T(a[i]) + T(b[i])
        return
    end
    @metal threads=2 groups=2 kernel1(a, b, c)
end

The problems you're getting is just because you're trying to pass actual types, which aren't concrete objects, instead of typevars Julia can specialize on. There's a very big difference between them, even though the syntax looks similar.

from metal.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.