mikaelslevinsky / fasttransforms Goto Github PK
View Code? Open in Web Editor NEW:bullettrain_front: Fast orthogonal polynomial transforms :surfer:
License: MIT License
:bullettrain_front: Fast orthogonal polynomial transforms :surfer:
License: MIT License
This has been done for
There is almost no documentation or even code comments to explain what the code does. For example, I'm trying to use c_execute_tri_lo2hi
but there is no explanation what basis the lo should be in.
julia> x = randn(100); @time cheb2jac(x, -1/2,-1/2);
0.000522 seconds (18 allocations: 1.234 KiB)
julia> x = randn(1000); @time cheb2jac(x, -1/2,-1/2);
0.025063 seconds (18 allocations: 8.297 KiB)
Single step decrements in Proriol/Zernike/spherical harmonic polynomial orders are currently done with a sweep of Givens rotations. Nominally, they come from the QR factorization of W = I±X
or I-X^2
. Multi-step decrements can be done with a Householder QR factorization of W^k
for some integer k
(c.f. https://arxiv.org/pdf/2302.08448.pdf with @dlfivefifty and @TSGut). It's reasonable then that k
would be chosen such that the Householder vectors fit exactly into SIMD registers or small multiples of these to maximize throughput.
This used to work on Travis, but now that the free trial has been exhausted and the CI has mostly been migrated to GitHub Actions, the coverage flags make the tests extremely slow. If this can somehow be fixed, it may be reimplemented by the two environment variables COV=gcov
and FT_COVERAGE=1
, and post-success make coverage && bash <(curl -s https://codecov.io/bash)
.
A cross-compiler might be more advanced than the user's personal computer. Thus, cross-compiling should be done at the highest level of SIMD provided that runtime checks on cpuid
can be implemented and have the least overhead.
julia> n = 1000; x = randn(n); @time FastTransforms.lib_jac2jac(x, 0, 0, 1, 1);
0.021386 seconds (6 allocations: 8.109 KiB)
julia> n = 10_000; x = randn(n); @time FastTransforms.lib_jac2jac(x, 0, 0, 1, 1);
0.676580 seconds (7 allocations: 78.344 KiB)
tests freeze here
...
[44cfe95a] Pkg
[de0858da] Printf
[3fa0cd96] REPL
[9a3f8284] Random
[ea8e919c] SHA
[9e88b42a] Serialization
[6462fe0b] Sockets
[2f01184e] SparseArrays
[10745b16] Statistics
[8dfed614] Test
[cf7118a7] UUIDs
[4ec0a83e] Unicode
Test Summary: | Pass Total
Special functions | 19 19
Test Summary: | Pass Total
Chebyshev transform | 250 250
Test Summary: | Pass Total
Fejér and Clenshaw--Curtis quadrature | 9 9
Right now, one has conditionals based on parameters, the other always executes the 1D column transforms.
[solver-mbook:~/Projects/FastTransforms] solver% export CC=gcc-8 && make
make lib
gcc-8 -Ofast -march=native -mtune=native -I./src -I/usr/local/opt/openblas/include -I/usr/local/opt/fftw/include -shared -lm -fPIC -fopenmp src/transforms.c src/rotations.c src/permute.c src/drivers.c src/fftw.c -L/usr/local/opt/openblas/lib -L/usr/local/opt/fftw/lib -lm -lblas -lfftw3 -lfftw3_threads -o libfasttransforms.dylib
In file included from src/transforms.c:3:
src/ftinternal.h:4:10: fatal error: stdlib.h: No such file or directory
#include <stdlib.h>
^~~~~~~~~~
compilation terminated.
In file included from /usr/local/opt/openblas/include/openblas_config.h:104,
from /usr/local/opt/openblas/include/cblas.h:5,
from src/fasttransforms.h:4,
from src/rotations.c:3:
/usr/local/Cellar/gcc/8.2.0/lib/gcc/8/gcc/x86_64-apple-darwin17.7.0/8.2.0/include-fixed/stdio.h:78:10: fatal error: _stdio.h: No such file or directory
#include <_stdio.h>
^~~~~~~~~~
compilation terminated.
In file included from src/permute.c:3:
src/ftinternal.h:4:10: fatal error: stdlib.h: No such file or directory
#include <stdlib.h>
^~~~~~~~~~
compilation terminated.
In file included from /usr/local/opt/openblas/include/openblas_config.h:104,
from /usr/local/opt/openblas/include/cblas.h:5,
from src/fasttransforms.h:4,
from src/drivers.c:3:
/usr/local/Cellar/gcc/8.2.0/lib/gcc/8/gcc/x86_64-apple-darwin17.7.0/8.2.0/include-fixed/stdio.h:78:10: fatal error: _stdio.h: No such file or directory
#include <_stdio.h>
^~~~~~~~~~
compilation terminated.
In file included from /usr/local/opt/openblas/include/openblas_config.h:104,
from /usr/local/opt/openblas/include/cblas.h:5,
from src/fasttransforms.h:4,
from src/fftw.c:3:
/usr/local/Cellar/gcc/8.2.0/lib/gcc/8/gcc/x86_64-apple-darwin17.7.0/8.2.0/include-fixed/stdio.h:78:10: fatal error: _stdio.h: No such file or directory
#include <_stdio.h>
^~~~~~~~~~
compilation terminated.
make[1]: *** [lib] Error 1
make: *** [all] Error 2
CC @dawese.
Dunkl-Xu polynomials with the weight (1-x^2-y^2)^β
are written in terms of equal-parameter Jacobi polynomials. These can be generalized in two ways: with inequal-parameter Jacobi polynomials for a weight like (1-sqrt(x^2+y^2))^α(1+sqrt(x^2+y^2))^β
; or with equal-parameter Jacobi polynomials with the quadratic transformations for a weight like (x^2+y^2)^α(1-x^2-y^2)^β
. The latter, which I prefer due to their correspondence with generalized Zernike, are like 2D Konoplev OPs with a symmetric weight on [-1,1] with interior algebraic factor. @dlfivefifty
Directly using OpenMP's functions may lead to issues when linking with other libraries.
m == |m| as m is non-negative. This could be confusing on this page:
https://mikaelslevinsky.github.io/FastTransforms/transforms.html
n = 600
S = KoornwinderTriangle(0.0,-0.5,-0.5)
ff = (xy) -> P(18,8,0.,-0.5,-0.5,xy...)
p = points(S,n)
v = ff.(p)
PP = plan_transform(S,v)
v̂ = PP.duffyplan*v
F = tridevec_trans(v̂)
F̌ = FastTransforms.cheb2tri(F,0.,-0.5,-0.5)
F̌₂ = PP.tri2cheb \ F
norm(F̌ - F̌₂)
size(F)
F̃ = copy(F)
c_cheb2tri(c_plan_tri2cheb(size(F,1),0.0,-0.5,-0.5),F̃)
There should be a macro that determines the user's API for the "most vectorized" driver routines.
I think I narrowed down the bug:
julia> using MultivariateOrthogonalPolynomials
julia> for _=1:10000
MultivariateOrthogonalPolynomials.CDisk2CxfPlan(2)
end
julia> for _=1:10000
MultivariateOrthogonalPolynomials.CDisk2CxfPlan(3)
end
julia> for _=1:10000
MultivariateOrthogonalPolynomials.CDisk2CxfPlan(3)
end
julia> for _=1:10000
MultivariateOrthogonalPolynomials.CDisk2CxfPlan(3)
end
julia> for _=1:10000
MultivariateOrthogonalPolynomials.CDisk2CxfPlan(3)
end
julia> for _=1:10000
MultivariateOrthogonalPolynomials.CDisk2CxfPlan(4)
end
julia> for _=1:10000
MultivariateOrthogonalPolynomials.CDisk2CxfPlan(4)
end
julia> for _=1:10000
MultivariateOrthogonalPolynomials.CDisk2CxfPlan(4)
end
julia> for _=1:10000
MultivariateOrthogonalPolynomials.CDisk2CxfPlan(4)
end
julia> for _=1:10000
MultivariateOrthogonalPolynomials.CDisk2CxfPlan(5)
end
julia> for _=1:10000
MultivariateOrthogonalPolynomials.CDisk2CxfPlan(5)
end
julia> for _=1:10000
MultivariateOrthogonalPolynomials.CDisk2CxfPlan(5)
end
julia> for _=1:10000
MultivariateOrthogonalPolynomials.CDisk2CxfPlan(1)
end
julia> for _=1:10000
MultivariateOrthogonalPolynomials.CDisk2CxfPlan(1)
end
julia> for _=1:10000
MultivariateOrthogonalPolynomials.CDisk2CxfPlan(1)
end
signal (11): Segmentation fault: 11
in expression starting at no file:0
_ZN4llvm20AAResultsWrapperPass13runOnFunctionERNS_8FunctionE at /Users/solver/Projects/julia-1.1/usr/lib/libLLVM.dylib (unknown line)
_ZN4llvm13FPPassManager13runOnFunctionERNS_8FunctionE at /Users/solver/Projects/julia-1.1/usr/lib/libLLVM.dylib (unknown line)
_ZN4llvm13FPPassManager11runOnModuleERNS_6ModuleE at /Users/solver/Projects/julia-1.1/usr/lib/libLLVM.dylib (unknown line)
_ZN4llvm6legacy15PassManagerImpl3runERNS_6ModuleE at /Users/solver/Projects/julia-1.1/usr/lib/libLLVM.dylib (unknown line)
P = MultivariateOrthogonalPolynomials.c_plan_tri2cheb(4, 0.0, 0.5, -0.5)
F = zeros(4,4); F[1]=1;
MultivariateOrthogonalPolynomials.c_cheb2tri(P, F) # Fills F with NaN
Note that a,b,c = (0,-0.5,-0.5) works fine.
FFTW allows specifying "regions", for multidimensional FFTs:
julia> X = randn(5,5,5);
julia> F = fft(X,(1,3));
julia> F[:,1,:] ≈ fft(X[:,1,:]) # 2D FFT acting on 1st and 3rd dimension
true
It turns out I need this feature for Legendre transforms.... at the moment it just does 1D transforms:
julia> X = randn(5,5);
julia> cheb2leg(X)[:,1] == cheb2leg(X[:,1])
true
I can add it to FastTransforms.jl, e.g., if I need a 2D Legendre I can do
cheb2leg(cheb2leg(X')')
but I'm curious if this could be SIMD-optimised (or multithreaded) in C to make it faster?
brew install gcc
brew install openblas
brew install fftw
export CC=gcc-8 && make
export OMP_NUM_THREADS=4
export VECLIB_MAXIMUM_THREADS=4
Using an O(n2 log n) instead of O(n2) precomputation, the colatitudinal DCTs and DSTs can be applied to the Legendre to Chebyshev connection coefficients to reduce the overall spherical harmonic synthesis and analysis to execute_sph_hi2lo
followed by a sequence of cblas_dgemm
and one longitudinal FFTW_HC2R
.
Contrast this with the current four-step approach of execute_sph_hi2lo
followed by a sequence of cblas_dtrmm
and a sequence of FFTW_REDFT01
and FFTW_RODFT01
, and finally the longitudinal FFTW_HC2R
.
Sometimes columns are not stored contiguously.
I think assuming stride(A,1) == 1
is fine (that is, ColumnMajor()
)
In preparation for Cylinder solves, in FastTransforms.jl I'm planning to add support for disk2cxf(A::Array{T,3}, α, β, 1)
in analogy to fft(A::Matrix, 1)
, etc. I can do this all in Julia via views (I assume strided array views are compatible with FastTransforms) but perhaps doing this in C there's potential for better parallelisation?
Note the interface needs some thought. Way back, when I wrote the previous paragraph, I was thinking disk2cxf(A::Array{T,3}, α, β, 1)
would act on the first two dimensions of the array but what if we want to act on the first and third? Or if we have tensor products of two disks do we want to write the transform that transforms both as disk2cxf(A::Array{T,4}, α, β, 1:2)
? Perhaps a better interface is
disk2cxf(A::Array{T,3}, α, β, (1,2)) # single disk transform on first two dimensions
disk2cxf(A::Array{T,3}, α, β, (1,3)) # single disk transform on first and last dimension
disk2cxf(A::Array{T,4}, α, β, ((1,2), (3,4)) # tensor disk transform on first and second followed by 3rd and 4th dimension
disk2cxf(A::Array{T,4}, α, β) # same as above
disk2cxf(A::Array{T,3}, α, β, ((1,2), (2,3)) # Get lost!
You might wonder why not leave it to the "user" of FastTransforms.jl. There won't likely be any speedup and multithreading is a pipe-dream. I guess the argument is (1) consistent interface (2) consistent allocation-free behaviour in higher dimensions (3) preparation for MPIFastTransforms.jl or similar for distributed memory parallelisation (which we pretty much need for Cylinders or Cubes to do anything fun)
This would make it easy to cite the package.
Hi Michael,
It took a little while to get build on linux going. Here's suggestions for tweaks to library (not worth a pull request). I only tested on ubuntu 16.04.
Append to Make.inc:
ifeq ($(UNAME), Linux)
LDLIBS += -lm
CFLAGS += -I/usr/lib/openblas-base
LDFLAGS += -L/usr/lib/openblas-base
endif
You'll also need to add in the ifeq windows block:
UNAME := Windows
Also, you'll want to add instructions to type in the shell:
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:.
Otherwise the executable can't find the .so
My tests now all run, although I don't understand the output.
PS It's not apparent from your docs how to do a simple spherical harmonic eval or projection onto a tensor product grid. Am I supposed to do sph to Fourier then do FFT2 on grid? I would have to guess the normalization, etc. An example of that would be very helpful.
PPS Good to see you at ICOSAHOM!
I remember there used to be docs on what the transforms actually do, but they seem to have disappeared from the docs page! Where's the Disk transform definition, for example??
Probably ft_
is reasonable.
Hi! I'm working on problems with data defined on the n-sphere (n >= 128 or so). From looking around the documentation it seems like this library doesn't support that--do I have the right? Are you familiar with any other libraries that support it?
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.