Comments (5)
Oh, I didn't realize the try-catch
was being run at macroexpansion time, I thought it was runtime, my mistake! I'm not at all concerned about the macroexpansion or compile times of Tullio.
from tullio.jl.
What's going on is that it tries to expand @avx
within a try-catch block, and for the gradient it fails:
julia> let v = [-1, 0, 1, 2]
@tullio out := abs(v[i]) verbose=true
end
â Info: symbolic gradients
â inbody =
â 1-element Array{Any,1}:
â :(ðĨv[i] = ðĨv[i] + conj(conj(ðĨâ[1]) * DiffRules._abs_deriv(v[i])))
DiffRules._abs_deriv
â Warning: LoopVectorization failed (symbolic gradient)
â err =
â LoadError: "Expression not recognized."
â in expression starting at /Users/me/.julia/packages/Tullio/MpOl9/src/macro.jl:856
â @ Tullio ~/.julia/packages/Tullio/MpOl9/src/macro.jl:864
And just before it fails, it helpfully prints out the problematic expression:
https://github.com/chriselrod/LoopVectorization.jl/blob/c21c174f0f6676ea9098e632d9bd79e5fb51e885/src/graphs.jl#L606
It seems a little un-Junlian to need try/catch, but it seems otherwise hard to predict what will or won't work. I'm not even sure why it dislikes _abs_deriv
, which is from here
https://github.com/JuliaDiff/DiffRules.jl/blob/c97ee0b8a7431a7d707e3a6bcb66de76fdc240b1/src/rules.jl#L68
but it certainly doesn't work:
julia> let v = [-1, 0, 1, 2]
@tullio a[i] := Tullio.DiffRules._abs_deriv(v[i])
end
ERROR: TypeError: non-boolean (VectorizationBase.Mask{4,UInt8}) used in boolean context
Stacktrace:
[1] _abs_deriv at /Users/me/.julia/packages/DiffRules/5QwtC/src/rules.jl:72 [inlined]
from tullio.jl.
This particular case is silenced by 3d6d677.
from tullio.jl.
Hm, won't there be a performance overhead from the try-catch
?
from tullio.jl.
Sure, but I don't think it's the biggest concern. On a fresh session, it takes about this long to run the macro:
julia> @time using Tullio
0.355330 seconds (1.22 M allocations: 61.950 MiB)
julia> @time @macroexpand @tullio C[i,j] = A[i,k] * B[k,j];
4.052740 seconds (12.47 M allocations: 629.727 MiB, 4.78% gc time)
julia> @time @macroexpand @tullio C[i,j] = A[i,k] * B[k,j];
0.001012 seconds (2.01 k allocations: 123.000 KiB)
and with LoopVectorization:
julia> @time using Tullio, LoopVectorization
2.222334 seconds (4.79 M allocations: 264.077 MiB, 3.54% gc time)
julia> @time @macroexpand @tullio C[i,j] = A[i,k] * B[k,j];
6.466098 seconds (16.44 M allocations: 832.955 MiB, 5.04% gc time)
julia> @time @macroexpand @tullio C[i,j] = A[i,k] * B[k,j];
0.002154 seconds (4.49 k allocations: 294.984 KiB)
I would love this to be quicker, but don't know how. Slightly sad comparison:
julia> @time using Einsum
0.035711 seconds (121.89 k allocations: 6.455 MiB)
julia> @time @macroexpand @einsum C[i,j] = A[i,k] * B[k,j];
0.276000 seconds (394.87 k allocations: 19.989 MiB, 4.82% gc time)
julia> @time @macroexpand @einsum C[i,j] = A[i,k] * B[k,j];
0.000290 seconds (458 allocations: 29.516 KiB)
(Edit -- replaced with @macroexpand
times, perhaps a better measure. All on Julia 1.5.0.)
If I'm doing this right, the try-catch itself costs about 60Ξs, compared to 1ms. I guess the cost of expanding @avx
until it fails is the big cost, which could be avoided if I detected when to do this. Earlier versions had some code for this...
from tullio.jl.
Related Issues (20)
- Alternative to Tullio for Chained Multiplication HOT 4
- @views macro causes module compilation failure HOT 3
- Reporting a bug when Tullio being included with LoopVectorization HOT 1
- [Question] Is it possible to create a vector of SVectors from a Matrix using Tullio? HOT 2
- [Question] How to change summation order? HOT 5
- Use package extensions HOT 1
- How finalizers `|>` work HOT 5
- Method error when broadcast and sum of matrices HOT 1
- GPU Kernel Compilation Failed with Interpolations HOT 2
- Upgrade to CUDA.CUDAKernels HOT 9
- Bug when using Tullio + LoopVectorization HOT 5
- Add Finch.jl backend HOT 4
- CUDA v4 support HOT 2
- Using threads, vs setting threads=false gives different result HOT 3
- Issue with vectorized functions on GPU HOT 3
- Error when specifying the range of an index with a UnitRange HOT 4
- Scalar indexing with CUDA HOT 10
- Please update dep of FillArrays to v1.
- Bad interaction with Enzyme? HOT 6
- Zygote with Tullio gives wrong gradients/pullbacks using CUDA HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
ð Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. ððð
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google âĪïļ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from tullio.jl.