Giter VIP home page Giter VIP logo

Comments (4)

maleadt avatar maleadt commented on September 21, 2024

I guess this is essentially a dup of #69. The question is why an exception is being generated here, as we do support sincos:

@device_override FastMath.sincos_fast(x::Float32) = ccall("extern air.fast_sincos.f32", llvmcall, Cfloat, (Cfloat,), x)
@device_override Base.sincos(x::Float32) = ccall("extern air.sincos.f32", llvmcall, Cfloat, (Cfloat,), x)
@device_override Base.sincos(x::Float16) = ccall("extern air.sincos.f16", llvmcall, Float16, (Float16,), x)

Can you see using Cthulhu how sincos is invoked?

from metal.jl.

fjebaker avatar fjebaker commented on September 21, 2024

Ah, sorry I didn't notice this the first time. It turns out its related to the indexed_iterate:

New MWE:

using Metal, KernelAbstractions

X = Metal.MtlArray(fill(0.3f0, 128))
Y = copy(X)

@kernel function mwe_kernel_sincos(out, a)
    I = @index(Global, Linear)
    s, c = sincos(a[I])
    out[I] = s + c
end

kernel = mwe_kernel_sincos(Metal.MetalBackend())
kernel(Y, X, ndrange = size(Y))
ERROR: InvalidIRError: compiling MethodInstance for gpu_mwe_kernel_sincos(::KernelAbstractions.CompilerMetadata{KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicCheck, Nothing, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, KernelAbstractions.NDIteration.NDRange{1, KernelAbstractions.NDIteration.DynamicSize, KernelAbstractions.NDIteration.DynamicSize, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}, CartesianIndices{1, Tuple{Base.OneTo{Int64}}}}}, ::MtlDeviceVector{Float32, 1}, ::MtlDeviceVector{Float32, 1}) resulted in invalid LLVM IR
Reason: unsupported call to an unknown function (call to gpu_malloc)
Stacktrace:
 [1] malloc
   @ ~/.julia/packages/GPUCompiler/YO8Uj/src/runtime.jl:88
 [2] macro expansion
   @ ~/.julia/packages/GPUCompiler/YO8Uj/src/runtime.jl:183
 [3] macro expansion
   @ ./none:0
 [4] box
   @ ./none:0
 [5] box_int64
   @ ~/.julia/packages/GPUCompiler/YO8Uj/src/runtime.jl:212
 [6] indexed_iterate                                                            <--
   @ ./tuple.jl:97
 [7] macro expansion
   @ ~/Developer/jl-forward-diff/mwe.jl:9
 [8] gpu_mwe_kernel_sincos
   @ ~/.julia/packages/KernelAbstractions/cWlFz/src/macros.jl:90
 [9] gpu_mwe_kernel_sincos
   @ ./none:0

Throws ostensibly the same error as above. So, instead trying

@kernel function mwe_kernel_sincos(out, a)
    I = @index(Global, Linear)
    k = sincos(a[I])
    out[I] = k[1] + k[2]
end

Throws no error but only has k[1] non-zero -- that is, k[2] doesn't have a value at all?

from metal.jl.

fjebaker avatar fjebaker commented on September 21, 2024

It seems to me that the Metal sincos only returns a single float, which is the sin part? @*code_warntype confirms this with the external calls?

Edit: some examples

Kernel:

@kernel function mwe_kernel_sincos(out, a)
    I = @index(Global, Linear)
    k = sincos(a[I])
    out[I] = k[1]
end
41%119 = $(Expr(:foreigncall, "extern air.sincos.f32", Float32, svec(Float32), 0, :(:llvmcall), :(%116), :(%116)))::Float32
└───        goto #43 if not true
42nothing::Nothing
43 ┄        goto #44
44 ─        goto #49 if not true
45%124 = Core.tuple(%92)::Tuple{UInt32}%125 = Base.getfield(out, :shape)::Tuple{Int64}%126 = Base.getfield(%125, 1, true)::Int64

Kernel:

@kernel function mwe_kernel_sincos(out, a)
    I = @index(Global, Linear)
    k = sincos(a[I])
    out[I] = k[2]
end
41%119 = $(Expr(:foreigncall, "extern air.sincos.f32", Float32, svec(Float32), 0, :(:llvmcall), :(%116), :(%116)))::Float32
└───        goto #43 if not true
42 ─        Metal.throw(Metal.nothing)::Union{}
└───        unreachable
43 ─        goto #44
44 ─        goto #49 if not true
45%125 = Core.tuple(%92)::Tuple{UInt32}%126 = Base.getfield(out, :shape)::Tuple{Int64}%127 = Base.getfield(%126, 1, true)::Int64

from metal.jl.

fjebaker avatar fjebaker commented on September 21, 2024

From the Metal developer API:

Screenshot 2023-08-01 at 13 36 13

So changing

@device_override function Base.sincos(x::Float32) 
    c = Ref{Cfloat}()
    s = ccall("extern air.sincos.f32", llvmcall, Cfloat, (Cfloat, Ptr{Cfloat}), x, c)
    (s, c[])
end

fixes everything.

I will open a PR with the fixes :)

from metal.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.