Giter VIP home page Giter VIP logo

Comments (7)

mcabbott avatar mcabbott commented on May 25, 2024

There should indeed be a way to set this. I wasn't sure whether to make it a function like zero, or a value like init=0.0, the pattern from reduce etc. Writing @tullio (&,false) b[i] := ... isn't so ugly either.

Another possibility is that there ought to be a function zero_under(+, Float64) == 0.0 but I don't know of one.

It could also catch a few more functions from Base, are there other candidates besides & and |?

from tullio.jl.

N5N3 avatar N5N3 commented on May 25, 2024

as far as i know, the reducedim.jl in Base only has:

for (Op, initfun) in ((:(typeof(add_sum)), :zero), (:(typeof(mul_prod)), :one))
    @eval initarray!(a::AbstractArray{T}, ::$(Op), init::Bool, src::AbstractArray) where {T} = (init && fill!(a, $(initfun)(T)); a)
end

for Op in (:(typeof(max)), :(typeof(min)))
    @eval initarray!(a::AbstractArray{T}, ::$(Op), init::Bool, src::AbstractArray) where {T} = (init && copyfirst!(a, src); a)
end

for (Op, initval) in ((:(typeof(&)), true), (:(typeof(|)), false))
    @eval initarray!(a::AbstractArray, ::$(Op), init::Bool, src::AbstractArray) = (init && fill!(a, $initval); a)
end

I opened this issue because I tried use StructArrays.jl and Tullio.jl together to merge three "@tullio"s into one, like:

        Result = StructArray((X1_sum, X2_sum, F_sum))
        @tullio Result[i] += (
            abs2(x1_fft[i, j]),
            abs2(x2_fft[i, j]),
            x1_fft[i, j] * conj(x2_fft[i, j]),
        )

instead of

        @tullio X1_sum[i] += abs2(x1_fft[i,j])
        @tullio X2_sum[i] += abs2(x2_fft[i,j])
        @tullio F_sum[i] += x1_fft[i,j] * conj(x2_fft[i,j])

for some acceleration, at the cost of "Base pollution" with dispatched Base.:+ and Base.zero.
I tried use accumu_fun(a,b) instead of +(a,b), but i find the dispatched zero(T) is unavoided, and there seems no way to turn on the keep.

from tullio.jl.

mcabbott avatar mcabbott commented on May 25, 2024

I see, and after defining Base.zero(T::Type{<:Tuple}) = map(zero, T.parameters) (why isn't that in Base?) and similar +, do you see acceleration from doing these loops together?

To be clear, there is no feature to control this at the moment, neither initialisation nor keeping values (except for += and *=). It's something I haven't got around to building.

from tullio.jl.

N5N3 avatar N5N3 commented on May 25, 2024
module test
using Tullio
using LoopVectorization
using StructArrays
# dispatch the Base.:+ and Base.zero
const ST = Tuple{T,T,Complex{T}} where {T}
Base.:+(x::ST, y::ST) = (x[1] + y[1], x[2] + y[2], x[3] + y[3])
Base.zero(::Type{Tuple{T,T,Complex{T}}}) where {T} =
    (zero(T), zero(T), zero(Complex{T}))
# 3 @tullio
func1(a,b,x,y,z) = begin
    @tullio x[i] = abs2(a[i,j])
    @tullio y[i] = abs2(b[i,j])
    @tullio z[i] = a[i,j] * conj(b[i,j])
end
# merged
func2(a,b,x,y,z) = begin
    result = StructArray((x,y,z))
    @tullio result[i] = (abs2(a[i,j]),abs2(b[i,j]),a[i,j]*conj(b[i,j]))
end
end
a = randn(ComplexF64,4000,4000)
b = randn(ComplexF64,4000,4000)
x = randn(Float64,4000)
y = randn(Float64,4000)
z = randn(ComplexF64,4000)
Tullio.TILE[] = 2^21
@btime test.func1($a,$b,$x,$y,$z) # 127.037 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 62.509 ms (50 allocations: 3.64 KiB)
Tullio.TILE[] = 2^10
@btime test.func1($a,$b,$x,$y,$z) # 46.956 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 27.686 ms (50 allocations: 3.64 KiB)

On my Laptop, it offered about 2x speed-up, as the cost of getindex is halved.
On the other hand, I think @avx will not work for the merged version:

# dispatch has modified
a = randn(Float64,4000,4000)
b = randn(Float64,4000,4000)
x = randn(Float64,4000)
y = randn(Float64,4000)
z = randn(Float64,4000)
Tullio.TILE[] = 2^21
@btime test.func1($a,$b,$x,$y,$z) # 19.890 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 44.259 ms (50 allocations: 3.64 KiB)
Tullio.TILE[] = 2^10
@btime test.func1($a,$b,$x,$y,$z) # 19.234 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 14.042 ms (50 allocations: 3.64 KiB)

from tullio.jl.

mcabbott avatar mcabbott commented on May 25, 2024

Thanks, I see. And this persists even for small sizes. Agree that @avx doesn't like the tuples, but it doesn't like Vector{ComplexF64} either.

from tullio.jl.

mcabbott avatar mcabbott commented on May 25, 2024

With the PR, you can write:

func3(a,b,x,y,z) = begin
    res = StructArray((x,y,z))
    ++(p,q) = map(+,p,q)
    @tullio (++) res[i] = (abs2(a[i,j]), abs2(b[i,j]), a[i,j]*conj(b[i,j]))  init=(0.0, 0.0, 0.0+0im)
end

from tullio.jl.

N5N3 avatar N5N3 commented on May 25, 2024

It seems that the redunction towards scalar(see thread_scalar()), might use init several times (depend on how many threads it used and whether KEEP is on at first), while thread_halves() will use init at most once.

Thus, a arbitrary user defined init might be dangerous for thread_scalar(), like init = 5 for reduction :+, at least a test like redfun(init,init) == init is needed.

from tullio.jl.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.