Comments (7)
There should indeed be a way to set this. I wasn't sure whether to make it a function like zero
, or a value like init=0.0
, the pattern from reduce
etc. Writing @tullio (&,false) b[i] := ...
isn't so ugly either.
Another possibility is that there ought to be a function zero_under(+, Float64) == 0.0
but I don't know of one.
It could also catch a few more functions from Base, are there other candidates besides &
and |
?
from tullio.jl.
as far as i know, the reducedim.jl in Base only has:
for (Op, initfun) in ((:(typeof(add_sum)), :zero), (:(typeof(mul_prod)), :one))
@eval initarray!(a::AbstractArray{T}, ::$(Op), init::Bool, src::AbstractArray) where {T} = (init && fill!(a, $(initfun)(T)); a)
end
for Op in (:(typeof(max)), :(typeof(min)))
@eval initarray!(a::AbstractArray{T}, ::$(Op), init::Bool, src::AbstractArray) where {T} = (init && copyfirst!(a, src); a)
end
for (Op, initval) in ((:(typeof(&)), true), (:(typeof(|)), false))
@eval initarray!(a::AbstractArray, ::$(Op), init::Bool, src::AbstractArray) = (init && fill!(a, $initval); a)
end
I opened this issue because I tried use StructArrays.jl and Tullio.jl together to merge three "@tullio"s into one, like:
Result = StructArray((X1_sum, X2_sum, F_sum))
@tullio Result[i] += (
abs2(x1_fft[i, j]),
abs2(x2_fft[i, j]),
x1_fft[i, j] * conj(x2_fft[i, j]),
)
instead of
@tullio X1_sum[i] += abs2(x1_fft[i,j])
@tullio X2_sum[i] += abs2(x2_fft[i,j])
@tullio F_sum[i] += x1_fft[i,j] * conj(x2_fft[i,j])
for some acceleration, at the cost of "Base pollution" with dispatched Base.:+
and Base.zero
.
I tried use accumu_fun(a,b)
instead of +(a,b)
, but i find the dispatched zero(T)
is unavoided, and there seems no way to turn on the keep
.
from tullio.jl.
I see, and after defining Base.zero(T::Type{<:Tuple}) = map(zero, T.parameters)
(why isn't that in Base?) and similar +
, do you see acceleration from doing these loops together?
To be clear, there is no feature to control this at the moment, neither initialisation nor keeping values (except for +=
and *=
). It's something I haven't got around to building.
from tullio.jl.
module test
using Tullio
using LoopVectorization
using StructArrays
# dispatch the Base.:+ and Base.zero
const ST = Tuple{T,T,Complex{T}} where {T}
Base.:+(x::ST, y::ST) = (x[1] + y[1], x[2] + y[2], x[3] + y[3])
Base.zero(::Type{Tuple{T,T,Complex{T}}}) where {T} =
(zero(T), zero(T), zero(Complex{T}))
# 3 @tullio
func1(a,b,x,y,z) = begin
@tullio x[i] = abs2(a[i,j])
@tullio y[i] = abs2(b[i,j])
@tullio z[i] = a[i,j] * conj(b[i,j])
end
# merged
func2(a,b,x,y,z) = begin
result = StructArray((x,y,z))
@tullio result[i] = (abs2(a[i,j]),abs2(b[i,j]),a[i,j]*conj(b[i,j]))
end
end
a = randn(ComplexF64,4000,4000)
b = randn(ComplexF64,4000,4000)
x = randn(Float64,4000)
y = randn(Float64,4000)
z = randn(ComplexF64,4000)
Tullio.TILE[] = 2^21
@btime test.func1($a,$b,$x,$y,$z) # 127.037 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 62.509 ms (50 allocations: 3.64 KiB)
Tullio.TILE[] = 2^10
@btime test.func1($a,$b,$x,$y,$z) # 46.956 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 27.686 ms (50 allocations: 3.64 KiB)
On my Laptop, it offered about 2x speed-up, as the cost of getindex
is halved.
On the other hand, I think @avx will not work for the merged version:
# dispatch has modified
a = randn(Float64,4000,4000)
b = randn(Float64,4000,4000)
x = randn(Float64,4000)
y = randn(Float64,4000)
z = randn(Float64,4000)
Tullio.TILE[] = 2^21
@btime test.func1($a,$b,$x,$y,$z) # 19.890 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 44.259 ms (50 allocations: 3.64 KiB)
Tullio.TILE[] = 2^10
@btime test.func1($a,$b,$x,$y,$z) # 19.234 ms (150 allocations: 10.41 KiB)
@btime test.func2($a,$b,$x,$y,$z) # 14.042 ms (50 allocations: 3.64 KiB)
from tullio.jl.
Thanks, I see. And this persists even for small sizes. Agree that @avx
doesn't like the tuples, but it doesn't like Vector{ComplexF64} either.
from tullio.jl.
With the PR, you can write:
func3(a,b,x,y,z) = begin
res = StructArray((x,y,z))
++(p,q) = map(+,p,q)
@tullio (++) res[i] = (abs2(a[i,j]), abs2(b[i,j]), a[i,j]*conj(b[i,j])) init=(0.0, 0.0, 0.0+0im)
end
from tullio.jl.
It seems that the redunction towards scalar(see thread_scalar()
), might use init
several times (depend on how many threads it used and whether KEEP
is on at first), while thread_halves()
will use init
at most once.
Thus, a arbitrary user defined init
might be dangerous for thread_scalar()
, like init = 5
for reduction :+
, at least a test like redfun(init,init) == init is needed.
from tullio.jl.
Related Issues (20)
- HybridArrays HOT 9
- Runtime dispatch when reducing to scalar
- Alternative to Tullio for Chained Multiplication HOT 4
- @views macro causes module compilation failure HOT 3
- Reporting a bug when Tullio being included with LoopVectorization HOT 1
- [Question] Is it possible to create a vector of SVectors from a Matrix using Tullio? HOT 2
- [Question] How to change summation order? HOT 5
- Use package extensions HOT 1
- How finalizers `|>` work HOT 5
- Method error when broadcast and sum of matrices HOT 1
- GPU Kernel Compilation Failed with Interpolations HOT 2
- Upgrade to CUDA.CUDAKernels HOT 9
- Bug when using Tullio + LoopVectorization HOT 5
- Add Finch.jl backend HOT 4
- CUDA v4 support HOT 2
- Using threads, vs setting threads=false gives different result HOT 3
- Issue with vectorized functions on GPU HOT 3
- Error when specifying the range of an index with a UnitRange HOT 4
- Scalar indexing with CUDA HOT 10
- Please update dep of FillArrays to v1.
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tullio.jl.