fftw / fftw3 Goto Github PK

View Code? Open in Web Editor NEW

2.7K 2.7K 660.0 11.38 MB

DO NOT CHECK OUT THESE FILES FROM GITHUB UNLESS YOU KNOW WHAT YOU ARE DOING. (See below.)

License: GNU General Public License v2.0

C 76.53% Shell 0.82% Perl 0.99% Makefile 3.37% OCaml 13.52% M4 3.32% CMake 0.89% Roff 0.57%

fftw3's People

Contributors

Stargazers

Watchers

Forkers

alazaris luxun1 distrotech podochen jglando achellies golden1232004 mpip andisworld rdolbeau mdavid luojf945 barvinograd jgeorge33 khoadgnguyen cenllv keepmov oestape kod3r tcyu davidjung eberttbi duythanhphan hihihippp ganinaleksei callisto1 dynamicsong thismagpie harmmunk swissel bedoustani liangyaozhan ericvh mcatalancid sivaramambikasaran dfarns pzhengwf albersmc shivamassimo dronefpgabasedarchitecture-pizza lypinggan liuchungui basstone saya1989 jueqingsizhe66 tanhaipeng macolino marphy esnyder caijinyan simudream fuchsto cosinx ascenix gonghaiming kashinglee dellelce a-l-e-x biotrump chappjc psteinb huonw tbowers7 fr5178 lorlor xiaoqjwt wendingyuan eriklindahl jocaroman frankyyyt chiwenlin carltonsemple smmiller196 yueerwang cesar-rocha subailong chengjiangxiliu ylyking eunnieverse avrock123 fsun2p10 tempbottle wangha43 rainwoodman gp1313 pacificit skipper89rus joncmu jbishop156 inonchiu zeekhuge kimgc1983 luohaothu zlmturnout crystallinal silky mnamur ribster rleonid feigouhai

fftw3's Issues

Missing Fortran interface for fftwq_alloc_real

It looks like the script that generates fftw3q.f03 failed to produce an entry for the function fftwq_alloc_real on line 567.

! Unable to generate Fortran interface for fftwq_alloc_real

should probably be

type(C_PTR) function fftwq_alloc_real(n) bind(C, name='fftwq_alloc_real')
  import
  integer(C_SIZE_T), value :: n
end function fftwq_alloc_real

Bit Reproducibility with plan_many_dft with variable howmany

I'm opening a new issue here since the other was closed and I am unable to reopen it.

As stated in the manual, given the nature of FTTW_ESTIMATE you would expect a deterministic solution (i.e. bit reproducibility). However, it seems that this is not always the case, as observed for some cases comparing a plan_many with howmany=1 vs plan_many with howmany>1. Is this expected, that the same plan is not determined with variable howmany, and so bit reproducibility is not supported for these cases?

I submit the following inplace R2C/C2R example using fftw_plan_many_dft_r2c/c2r, with plan flags FFTW_ESTIMATE used (although I have experimented with FFTW_NO_SIMD and FFTW_UNALIGNED, and see similar failures). There are many particular combinations of howmany and N which produce a "failure."

I transform a "reference" 1D array of length N, is compared to the results of Ny transforms on a "test" set of identical 1D arrays (i.e., each of the Ny "rows" is initially identical to the reference row, laid out in memory as a 2D array). For the reference array I plan using howmany=1, and for the Ny test arrays I plan using howmany=Ny. It is observed that for some values of N, Ny the transformed test arrays are not identical to the transformed reference array. Looking further, I see that the plans returned during these "failures" are not equal.
I won't paste the full output, which shows the results of the checks, but they differ in the LSB. Instead I paste the plans generated, allowing for comparison.
Here is the result for Ny=2, N=2522 (although, as you say, results may vary compiler to compiler), where the mismatches occur after the backward (c2r) transform only. (I have seen cases where r2c failed as well):

Reference Plan:
(rdft2-ct-dif/2
(hc2c-direct-2/4/0 "hc2cbdftv_2_avx"
(rdft2-hc2r-direct-2 "r2cb_2")
(rdft2-nop))
(dft-ct-dif/13
(dftw-generic-dif-13-97
(dft-direct-13-x97 "n1bv_13_avx"))
(dft-buffered-97-x13/13-5
(dft-vrank>=1-x13/1
(dft-rader-97/is=2/os=2
(dft-ct-dit/16
(dftw-direct-16/16 "t3bv_16_avx")
(dft-direct-6-x16 "n1_6"))
(dft-ct-dit/8
(dftw-direct-8/12 "t3fv_8_avx")
(dft-directbuf/14-12-x8 "n2fv_12_avx"))
(dft-ct-dit/8
(dftw-direct-8/12 "t3fv_8_avx")
(dft-directbuf/14-12-x8 "n2fv_12_avx"))))
(dft-r2hc-1
(rdft-rank0-tiledbuf/2-x13-x97))
(dft-nop))))

Test Plan:
(rdft2-ct-dif/2
(hc2c-direct-2/4/0-x2 "hc2cbdftv_2_avx"
(rdft2-hc2r-direct-2 "r2cb_2")
(rdft2-nop))
(dft-buffered-1261-x2/2-1
(dft-vrank>=1-x2/1
(dft-ct-dif/13
(dftw-generic-dif-13-97
(dft-direct-13-x97 "n1bv_13_avx"))
(dft-vrank>=1-x13/1
(dft-rader-97/is=2/os=26
(dft-ct-dit/8
(dftw-direct-8/28 "t1buv_8_avx")
(dft-direct-12-x8 "n1_12"))
(dft-ct-dit/8
(dftw-direct-8/12 "t3fv_8_avx")
(dft-directbuf/14-12-x8 "n2fv_12_avx"))
(dft-ct-dit/8
(dftw-direct-8/12 "t3fv_8_avx")
(dft-directbuf/14-12-x8 "n2fv_12_avx"))))))
(dft-r2hc-1
(rdft-rank0-iter-ci/2522-x2))
(dft-nop)))

Notice that the plans are similar, but not identical -- likely accounting for the slight difference in the transformed values (which are of the order 1e-10, but again, only for some of the elements). Is there something about the plan_many that itself doesn't guarantee bit reproducibility, seemingly variable with howmany?

A test code follows (sorry if it's a bit long, I have a lot of checks in there):

include

include "fftw3.h"

int main(void) {

double *ref, *test;
void *in, *out;
fftw_plan fpref, fptest, bpref, bptest;
int i, j, Npass, Nfail, Ndims, N, Nx, Ny, stride, rdist, cdist, inembed, onembed;

Ndims = 1;
stride = 1;
inembed = onembed = 0;
Ny = 2;
N = 2522;
Nx = (N/2+1)*2;
rdist = Nx;
cdist = N/2+1;

// Allocate reference and test arrays
ref = fftw_malloc(Nxsizeof(double));
test = fftw_malloc(NxNy*sizeof(double));

// Plan reference (Nx)
in = out = ref;
fpref = fftw_plan_many_dft_r2c(Ndims, &N, 1, in, &inembed, stride, rdist, out, &onembed, stride, cdist, FFTW_ESTIMATE);
bpref = fftw_plan_many_dft_c2r(Ndims, &N, 1, out, &onembed, stride, cdist, in, &inembed, stride, rdist, FFTW_ESTIMATE);

// Plan test (Nx*Ny)
in = out = test;
fptest = fftw_plan_many_dft_r2c(Ndims, &N, Ny, in, &inembed, stride, rdist, out, &onembed, stride, cdist, FFTW_ESTIMATE);
bptest = fftw_plan_many_dft_c2r(Ndims, &N, Ny, out, &onembed, stride, cdist, in, &inembed, stride, rdist, FFTW_ESTIMATE);

// printf("Filling ref array\n");
for (j=0; j<N; ++j){ ref[j] = (double) (j+1); }

// printf("Filling test array\n");
for (i=0; i<Ny; ++i){ for (j=0; j<N; ++j){ test[i,j] = ref[j]; } }

// printf("Executing forward plans\n");
fftw_execute(fpref);
fftw_execute(fptest);

// printf("Evaluating arrays after forward transform\n");
Nfail = 0;
Npass = 0;
for (i=0; i printf("FWD: N = %d, Nx = %d, Ny = %d, Npass = %lld, Nfail = %lld\n", N, Nx, Ny, Npass, Nfail);
if ( Nfail > 0 ){
printf("Reference Plan:\n");
fftw_print_plan(fpref);
printf("\nTest Plan:\n");
fftw_print_plan(fptest);
printf("\n\n");
}

// printf("Executing backward plans\n");
fftw_execute(bpref);
fftw_execute(bptest);

// printf("Evaluating arrays after backward transform\n");
Nfail = 0;
Npass = 0;
for (i=0; i printf("BWD: N = %d, Nx = %d, Ny = %d, Npass = %lld, Nfail = %lld\n", N, Nx, Ny, Npass, Nfail);
if ( Nfail > 0 ){
printf("Reference Plan:\n");
fftw_print_plan(bpref);
printf("\nTest Plan:\n");
fftw_print_plan(bptest);
printf("\n\n");
}

// printf("Freeing resources\n");
fftw_destroy_plan(fpref);
fftw_destroy_plan(fptest);
fftw_destroy_plan(bpref);
fftw_destroy_plan(bptest);
fftw_free(ref);
fftw_free(test);

return 0;
}

Thread Safe Planner

http://www.fftw.org/fftw3_doc/Thread-safety.html mentions

We do not think this should be an important restriction

I hope I can convince you otherwise.

Problem Description

fftw3 is used by a variety of audio plugins.

Those plugins are loaded into the host's memory-space (usually an audio workstation). The host has limited control of what the plugin does internally, and the plugins do not know about each other.

There is no way to ensure that two independent plugins which are linked against libfftw do not run the shared planner simultaneously. Nor is there a possibility to control this on host application level.

When two independent plugins create fftw plans the application usually segfaults or similar undesired effects manifest.

Possible solutions for this include:

Statically link plugins against libfftw. Every plugin will have its own copy. The plans are not shared with other plugins (which is mostly fine). This still requires a bit of special attention: (fftw symbol visibility needs to be overridden for static links and the plugin must protect its planning routings for multiple instances of itself). Furthermore distributors must honor that (special built of fftw + static link). -- It is very unlikely that both plugin-authors and various gnu/linux-distributors do get this right (most distros dislike static linking) for the growing number of audio-plugins using fftw.
process separate all plugins in the host. That is not a viable option for Digital Audio Workstations where low-latency is important, context switches (particularly realtime thread) heavy and inter-process communication does not scale (compared to shared memory), especially so if the DAW does not limit audio track or channel count.
Discourage use of fftw for audio-plugins or even refuse to load plugis using it in the host. -- not the best idea :)
Ship a special (ABI compatible) build of libfftw with the host application which protects the planner. Plugins in the same memory space will use the already loaded library. This requires patching libfftw, but when doing so... why not do it upstream directly. Otherwise it has similar issues as (1).

Discussion

The issue at hand is not limited to audio-application, there are likely other applications with similar problems out there (gnu-octave comes to mind, but I don't know for certain).

As the Thread-safety page mentions, it's as simple as

wrap a semaphore lock around any calls to the planner

Is there some good reason why libfftw does not do this by default?

Existing applications should not be affected by this (they're not supposed to call the planner from different threads), but that change would make all the difference for multi-threaded plugin hosts.

I suppose it could be a bit of work to wrap all planner entry-points with a semaphore, yet there may be a neat simple solution using #define.

I'll be happy to look into this, but before going that way, I'd like to ask if such a change would be accepted by fftw or if there is an even better solution planned for future version that will make fftw's planner thread-safe.

yours truly,
robin - for the linux-audio community and for himself

Notable audio plugins using fftw3:
http://calf.sourceforge.net/
http://factorial.hu/plugins/lv2/ir
http://guitarix.sourceforge.net/
http://breakfastquay.com/rubberband/
http://plugin.org.uk/
http://zynaddsubfx.sourceforge.net/
https://github.com/x42/meters.lv2
...

Notable affected plugin hosts:
http://ardour.org/
http://qtractor.sourceforge.net/
https://github.com/falkTX/Carla/
...

Add workaround for portland pgc++ compiler that pretends to be gcc-4.8

The portland pgc++ compiler is not able to link to FFTW-3.3.4 since the compiler pretends to be gcc-4.8, although it doesn’t support __float128. Thus, it chokes on line 373 of fftw3.h.

Obviously this is not a bug in FFTW, and we'll file it with Portland group, but since older compilers will still be around and it is trivial to fix in the header you might want to include this in the next release.

The PGI compilers can always be identified with "defined(__PGI)", so I would suggest modifying the define starting on line 361 of fftw3.h to

if (GNUC > 4 || (GNUC == 4 && GNUC_MINOR >= 6)) \

&& !(defined(ICC) || defined(__INTEL_COMPILER) || defined(__PGI))
&& (defined(__i386) || defined(x86_64) || defined(ia64))

" error: identifier "__float128" is undefined " when using CUDA

Dear FFTW authors,

We are trying to compile code that uses fftw and CUDA using
Cuda 6.0.1 on a 64Bit Debian/Wheezy. Compiling the minimum example

// file: test.cu
#include <fftw3.h>

int main() 
{
  fftwf_complex a;
  return 0;
}

with

$nvcc test.cu -lfftw3 -I/usr/local/cuda/include

results in several error lines :

/usr/include/fftw3.h(371): error: identifier "__float128" is undefined

The issue is probably due to an incompatibility of nvcc and gcc regarding quadmath. For the intel compiler this is handled in line 359ff. of fftw3.h. Compiling the code and providing the intel flag manually

$nvcc test.cu -lfftw3 -I/usr/local/cuda/include -D__INTEL_COMPILER

works.

Cheers,

Marvin and Tobias

fftw php - doubts about object cloning, users space/maillist

Hi guys, at bukka/php-fftw we have two doubts
1)
in php we can clone a object, the problem is, how to clone? i was thinking about export wisdom from the first object and import wisdom at the second object, could it solve the problem?

i'm taking this idea from here:
http://www.fftw.org/fftw3_doc/Wisdom-Export.html#Wisdom-Export
http://www.fftw.org/fftw3_doc/Wisdom-Import.html#Wisdom-Import

what's the best place to ask question about fftw lib? i probably will have more doubt :)

Crosscompile for AARCH64 with neon fails

Hello,
I'm trying to make a X-Compile from debian 8 to AARCH64 with neon using aarch64-linux-gnu-gcc. Without NEON I have no problem's. But with setting the --enable-neon flag it fails.

Here is my build-log:
fftw-build.txt

I really don't know what I'm doing wrong.

Wrong height in the multi dimensional dft of real data doc

I'm looking at the documentation at http://www.fftw.org/doc/Multi_002dDimensional-DFTs-of-Real-Data.html#Multi_002dDimensional-DFTs-of-Real-Data

This page seems to say that, given a WxH of real data array, the r2c transformation will lead to a (W/2+1)xH complex array. Unfortunately, it seems the actual output is (W/2+1)x(H/2) complex values.

The following code allows to verify it: https://gist.github.com/ubitux/5442675
Output of this code is: https://gist.github.com/ubitux/5442713

If you attempt to change the number of displayed rows in the print_fft_block() function (by H like the doc seems to say, or even H/2+1, it will cause various invalid reads.

Assuming the code is correct, I believe something is wrong in the documentation, or at least not clear.

Modified input after calling fftw_execute_dft_c2r

Hey folks,

I came across this strange behavior of the fftw_execute_dft_c2r method. It should be reproduced by the following code:

#include <fftw3.h>

int
main()
{
    double x1[256], x2[256];
    fftw_complex z[129], ztest[129];
    double intest[256];
    int n = 256, i;

    for (i = 0; i < n; i++) {
        x1[i] = (i < 86 ? 0 : (i < 172 ? 1 : 0));
    }

    fftw_plan plan_r2c = fftw_plan_many_dft_r2c(1, &n, 1,
                                      intest, NULL, 1, 0,
                                      z, NULL, 1, 0,
                                      FFTW_MEASURE | FFTW_UNALIGNED);

    fftw_execute_dft_r2c(plan_r2c, x1, z);

    for (i = 0; i < 129; i++) {
        printf("%g + %gi\n", z[i][0], z[i][1]);
    }

    printf("\n\n\n");

    fftw_plan plan_c2r = fftw_plan_many_dft_c2r(1, &n, 1,
                                      ztest, NULL, 1, 0,
                                      x2, NULL, 1, 0,
                                      FFTW_MEASURE | FFTW_UNALIGNED);

    fftw_execute_dft_c2r(plan_c2r, z, x2);

    for (i = 0; i < 129; i++) {
        printf("%g + %gi\n", z[i][0], z[i][1]);
    }

}

As I understand, when a plan is properly initialized (as non-in-place trafo) then the input should be not modified upon execution. This is true for r2c and normal dft but not the c2r case. The plan creation doesn't seem to differ (in comparison to r2c and normal dft) so I am wondering if this could be a bug?

PS: This was reproduced by me on a Mac (OSX 10.9.2 with FFTW3) and on a Linux/Ubuntu machine independently.

Licensing issues regarding pure idioms used in fftw3

Can a header including this class be released in public domain, while its implementation is released with GPL 2 or later?

class PlanFloat_1dR2C
    {
    public:
        typedef std::complex<float> OutputType;
        typedef float InputType;

        static size_t sizeOut(size_t size_in)
            {return size_in/2+1;}

        PlanFloat_1dR2C(InputType* buffer_in
            ,OutputType* buffer_out, size_t n_elem);
        void execute();
        ~PlanFloat_1dR2C();

    private:
        void* plan;
    };

As you may notice, it is wrapper to an interface similar to FFTW3 (GPL), but since the header itself does not explicitly refer to that library, the implementation may lay in another library. Or does the copyright of FFTW3 also cover the create-execute-destroy idiom?

compiler errors

I'm getting weird compiler errors when trying to run the bootstrap.sh file. I've tried gcc 4.4.7, 4.8.2, and 5.2.0 and they all give me strange errors when running make. Here's one example:

libtool: compile: gcc -DHAVE_CONFIG_H -I. -I.. -I../simd -std=c99 -MT timer.lo -MD -MP -MF .deps/timer.Tpo -c timer.c -o timer.o In file included from timer.c:29:0: cycle.h: In function âgetticksâ: cycle.h:226:6: error: âasmâ undeclared (first use in this function) asm volatile("rdtsc" : "=a" (a), "=d" (d)); ^ cycle.h:226:6: note: each undeclared identifier is reported only once for each function it appears in cycle.h:226:10: error: expected â;â before âvolatileâ asm volatile("rdtsc" : "=a" (a), "=d" (d)); ^ make[2]: Leaving directory /scratch/mkg52/_MONSOON_SOFTWARE/fftw3/kernel'
make[2]: *** [timer.lo] Error 1
make[1]: Leaving directory /scratch/mkg52/_MONSOON_SOFTWARE/fftw3' make[1]: *** [all-recursive] Error 1 make: *** [all] Error 2

The config.log has a bunch of errors in it as well. Most of them look like this: conftest.c:17:7: error: 'thisisanerror' undeclared (first use in this function).

I've gotten fftw-3.3.4 to compile without any problems, but this current version is not working at all. Does anyone have any ideas?

Compiling from repository in Mac OS X error

In OSX 10.9.2 with all the dependencies: autoconf, autolib, ocaml, ...
Downloading the repo and running:

./bootstrap.sh
make

Produces an error:

/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: .libs/libsimd_sse2_nonportable.a(libsimd_sse2_nonportable_la-sse2-nonportable.o) has no symbols
libtool: link: ranlib .libs/libsimd_sse2_nonportable.a
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: file: .libs/libsimd_sse2_nonportable.a(libsimd_sse2_nonportable_la-sse2-nonportable.o) has no symbols
/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ranlib: warning for library: .libs/libsimd_sse2_nonportable.a the table of contents is empty (no object file members in the library define global symbols)
libtool: link: ( cd ".libs" && rm -f "libsimd_sse2_nonportable.la" && ln -s "../libsimd_sse2_nonportable.la" "libsimd_sse2_nonportable.la" )
Making all in dft
Making all in scalar
Making all in codelets
(cat ../../../COPYRIGHT ../../../support/codelet_prelude.dft; sh ../../../support/twovers.sh ../../../genfft/gen_notw.native -compact -variables 4 -pipeline-latency 4 -n 2 -name n1_2 -include "n.h") | sed -e s/@DATE@/"`date`"/ | indent -kr -cs -i5 -l800 -fca -nfc1 -sc -sob -cli4 -TR -Tplanner -TV >n1_2.c
indent: Command line: unknown parameter "-kr"
make[4]: *** [n1_2.c] Error 1
make[3]: *** [all-recursive] Error 1
make[2]: *** [all-recursive] Error 1
make[1]: *** [all-recursive] Error 1
make: *** [all] Error 2

Problem initializing plan

Hi.
I'm trying to use snd2fftw (http://snd2fftw.sourceforge.net/) to aplly DFT on audio file, but I receive the error: Fail to initialize FFTW plan.
The author has no more support to the software. What can be this?

alloc.c:269: assertion failed: p

I'm using FFTW 3.3.3, and I'm getting this error on some of the clusters on OSG:

alloc.c:269: assertion failed: p

which corresponds to the #ifdef MIN_ALIGNMENT block:

void *X(malloc_plain)(size_t n)
{
     void *p;
     if (n == 0)
          n = 1;
     p = X(kernel_malloc)(n);
     CK(p);

#ifdef MIN_ALIGNMENT
     A((((uintptr_t)p) % MIN_ALIGNMENT) == 0);
#endif

     return p;
}

These are my build options:

./configure --enable-single --enable-sse2 --enable-avx
--enable-threads --prefix=/home/yutong/fftw_install/

Any ideas?

PLANNING PROBLEM misleading

https://github.com/FFTW/fftw3/blob/master/tools/fftw-wisdom.c#L43

When running fftw-wisdom -v it outputs text like this:

PLANNING PROBLEM: cif1024
PLANNING PROBLEM: cib1024
PLANNING PROBLEM: cif2048
PLANNING PROBLEM: cib2048

I kept trying to work out what the problem was, and annoyed that it wasn't giving more details on this error message. It turns out that despite the SHOUTING it is actually not an error, and everything is good. I suggest changing the text to be far less ambiguous. For example remove the word 'problem' and make PLANNING be Planning.

make check fails on Power8 with gcc 5

$ ./configure --enable-float --enable-fma --enable-vsx && make -j96 && make check

Executing "/tmp/fftw-3.3.5/tests/bench --verbose=1 --verify 'ok5o11x4o10x6o10x11o11_4' --verify 'ik5o11x4o10x6o10x11o11_4' --verify '//obr9x5x24' --verify '//ofr9x5x24' --verify 'obr9x5x24' --verify 'ibr9x5x24' --verify 'ofr9x5x24' --verify 'ifr9x5x24' --verify '//obc9x5x24' --verify '//ibc9x5x24' --verify '//ofc9x5x24' --verify '//ifc9x5x24' --verify 'obc9x5x24' --verify 'ibc9x5x24' --verify 'ofc9x5x24' --verify 'ifc9x5x24' --verify 'ok12hx4hx13o01x13e10' --verify 'ik12hx4hx13o01x13e10' --verify 'obrd4x2x4x10v8' --verify 'ibrd4x2x4x10v8' --verify 'ofrd4x2x4x10v8' --verify 'ifrd4x2x4x10v8' --verify '//obcd4x2x4x10v8' --verify '//ibcd4x2x4x10v8' --verify '//ofcd4x2x4x10v8' --verify '//ifcd4x2x4x10v8' --verify 'obcd4x2x4x10v8' --verify 'ibcd4x2x4x10v8' --verify 'ofcd4x2x4x10v8' --verify 'ifcd4x2x4x10v8' --verify 'okd11088o11' --verify 'ikd11088o11' --verify 'obr8x4x8x4_6' --verify 'ibr8x4x8x4_6' --verify 'ofr8x4x8x4_6' --verify 'ifr8x4x8x4_6' --verify '//obc8x4x8x4_6' --verify '//ibc8x4x8x4_6' --verify '//ofc8x4x8x4_6' --verify '//ifc8x4x8x4_6' --verify 'obc8x4x8x4_6' --verify 'ibc8x4x8x4_6' --verify 'ofc8x4x8x4_6' --verify 'ifc8x4x8x4_6'"
ok5o11x4o10x6o10x11o11_4 2.13067e-07 3.24252e-06 2.67794e-07
ik5o11x4o10x6o10x11o11_4 2.16295e-07 3.14798e-06 2.42186e-07
//obr9x5x24 1.76873e-07 5.80388e-07 1.90459e-07
//ofr9x5x24 2.16065e-07 5.80388e-07 1.864e-07
Found relative error 3.978176e+13 (impulse 1)
[...]

Reprodced on Power8 with gcc 5.2 and 5.3, with gcc 4.8.4 and 4.9.1 the tests pass.

fftw.org is down

Grrr.

Does planning a FFT depend on the actual data content?

Basically the title. Does the plan generation for a DFT depend on the actual content of the input array, or just the size and whether the input and output are co-located?

I'm working in a environment where I have many worker threads doing parallel DFTs, and moving the planning out of the threads (that was a fun bug to track down) means I don't have ready access to a example instance of the data to pass into the planner. Can I just use a empty array of the same size as the data?

RFE: fftw.pc.in conditional MPI library inclusion

I understand that the position of those working on FFTW is to minimize library dependencies (at least with regards to pthreads and fftw3*_threads), but when building FFTW with MPI enabled, it may be desirable to include -lfftw3*_mpi in the pkg-config libraries portion. I'm not sure if this may cause problems on other systems, but at least in Fedora, the fftw3*.pc files will be installed in their respective MPI PKG_CONFIG_PATH (i.e. /usr/lib64/mpich/lib/pkgconfig or similar) and will not conflict with those non-MPI pc files when the user loads the corresponding MPI module (i.e. module load mpi/mpich-x86_64) for configuring and building with MPI.

In any case, an easy way to conditionally include -lfftw3*_mpi is in the configure.ac file add under the if test "$enable_mpi" = "yes"; then test the lines

LIBFFTW3MPI=-lfftw3${PREC_SUFFIX}_mpi
AC_SUBST(LIBFFTW3MPI)

and change the Libs: in fftw.pc.in to
Libs: -L${libdir} @LIBFFTW3MPI@ -lfftw3@PREC_SUFFIX@ @LIBQUADMATH@.

fftw 3.3.3 fails to build documentation with perl 5.18.0

if (/bin/sh /usr/src/fftw-3.3.3/fftw-single-3.3.3/missing --run makeinfo --version) >/dev/null 2>&1; then
for f in fftw3.info fftw3.info-[0-9] fftw3.info-[0-9][0-9] fftw3.i[0-9] fftw3.i[0-9][0-9]; do
if test -f $f; then mv $f $backupdir; restore=mv; else :; fi;
done;
else :; fi &&
cd "$am__cwd";
if /bin/sh /usr/src/fftw-3.3.3/fftw-single-3.3.3/missing --run makeinfo -I .
-o fftw3.info fftw3.texi;
then
rc=0;
CDPATH="${ZSH_VERSION+.}:" && cd .;
else
rc=$?;
CDPATH="${ZSH_VERSION+.}:" && cd . &&
$restore $backupdir/* echo "./fftw3.info" | sed 's|[^/]*$||';
fi;
rm -rf $backupdir; exit $rc
fftw3.texi:159: misplaced {
fftw3.texi:159: misplaced }
fftw3.texi:160: misplaced {
fftw3.texi:160: misplaced }
./intro.texi:16: warning: @ifinfo should only appear at a line beginning (possibly involving @onlogn)
./intro.texi:16: superfluous argument to @EnD tex: algorithms for all lengths, including (possibly involving @onlogn)
./tutorial.texi:100: warning: @ifinfo should only appear at a line beginning (possibly involving @onlogn)
./tutorial.texi:100: superfluous argument to @EnD tex: algorithm). (possibly involving @onlogn)
./tutorial.texi:358: warning: @ifinfo should only appear at a line beginning (possibly involving @onlogn)
./tutorial.texi:358: superfluous argument to @EnD tex: algorithm is used even for prime sizes. (possibly involving @onlogn)
./tutorial.texi:417: warning: @ifinfo should only appear at a line beginning (possibly involving @ndims)
./tutorial.texi:417: superfluous argument to @EnD tex: (in row-major order). (possibly involving @ndims)
./tutorial.texi:418: warning: @ifinfo should only appear at a line beginning (possibly involving @ndimshalf)
./tutorial.texi:418: superfluous argument to @EnD tex: array of (possibly involving @ndimshalf)
./tutorial.texi:425: warning: @ifinfo should only appear at a line beginning (possibly involving @ndims)
./tutorial.texi:425: superfluous argument to @EnD tex: and the complex (possibly involving @ndims)
./tutorial.texi:426: warning: @ifinfo should only appear at a line beginning (possibly involving @ndimshalf)
./tutorial.texi:426: superfluous argument to @EnD tex: . (possibly involving @ndimshalf)
./tutorial.texi:566: warning: @ifinfo should only appear at a line beginning (possibly involving @onlogn)
./tutorial.texi:566: superfluous argument to @EnD tex: algorithm is used even for prime sizes. (possibly involving @onlogn)
./other.texi:97: warning: @ifinfo should only appear at a line beginning (possibly involving @ndims)
./other.texi:97: superfluous argument to @EnD tex: . Now, we specify a location in the array by a (possibly involving @ndims)
./other.texi:209: superfluous argument to @EnD tex: rank-3 array: (possibly involving @threedims)
./reference.texi:384: warning: @ifinfo should only appear at a line beginning (possibly involving @onlogn)
./reference.texi:384: superfluous argument to @EnD tex: performance even for prime sizes). It is possible to customize FFTW (possibly involving @onlogn)
./reference.texi:632: warning: @ifinfo should only appear at a line beginning (possibly involving @onlogn)
./reference.texi:632: superfluous argument to @EnD tex: performance even for prime sizes). (It is possible to customize FFTW (possibly involving @onlogn)
./reference.texi:711: warning: @ifinfo should only appear at a line beginning (possibly involving @ndims)
./reference.texi:711: warning: @ifinfo should only appear at a line beginning (possibly involving @ndimshalf)
./reference.texi:711: superfluous argument to @EnD tex: , the complex data is an n[0] x n[1] x n[2] x ... x (n[d-1]/2 + 1) (possibly involving @ndims)
./reference.texi:711: superfluous argument to @EnD tex: array of (possibly involving @ndimshalf)
./reference.texi:721: warning: @ifinfo should only appear at a line beginning (possibly involving @ndims)
./reference.texi:721: superfluous argument to @EnD tex: in row-major order. (possibly involving @ndims)
./reference.texi:841: warning: @ifinfo should only appear at a line beginning (possibly involving @onlogn)
./reference.texi:841: superfluous argument to @EnD tex: performance even for prime sizes). (It is possible to customize FFTW (possibly involving @onlogn)
./reference.texi:2376: warning: @ifinfo should only appear at a line beginning (possibly involving @ndims)
./reference.texi:2376: superfluous argument to @EnD tex: multi-dimensional real-input DFT, the full (logical) complex output array (possibly involving @ndims)
./mpi.texi:262: warning: @ifinfo should only appear at a line beginning (possibly involving @twodims)
./mpi.texi:262: superfluous argument to @EnD tex: complex DFT, distributed over 4 (possibly involving @twodims)
./mpi.texi:263: warning: @ifinfo should only appear at a line beginning (possibly involving @twodims)
./mpi.texi:263: superfluous argument to @EnD tex: slice of the data. (possibly involving @twodims)
./mpi.texi:278: warning: @ifinfo should only appear at a line beginning (possibly involving @twodims)
./mpi.texi:278: superfluous argument to @EnD tex: array on three processes, you can (possibly involving @twodims)
./mpi.texi:318: warning: @ifinfo should only appear at a line beginning (possibly involving @twodims)
./mpi.texi:318: superfluous argument to @EnD tex: complex-DFT example, above, we would find (possibly involving @twodims)
./mpi.texi:472: warning: @ifinfo should only appear at a line beginning (possibly involving @ndims)
./mpi.texi:472: superfluous argument to @EnD tex: . As always, it is distributed along (possibly involving @ndims)
./mpi.texi:473: warning: @ifinfo should only appear at a line beginning (possibly involving @dimk)
./mpi.texi:473: superfluous argument to @EnD tex: . Now, if we compute its DFT with the (possibly involving @dimk)
./mpi.texi:475: warning: @ifinfo should only appear at a line beginning (possibly involving @ndimstrans)
./mpi.texi:475: superfluous argument to @EnD tex: , (possibly involving @ndimstrans)
./mpi.texi:476: warning: @ifinfo should only appear at a line beginning (possibly involving @dimk)
./mpi.texi:476: superfluous argument to @EnD tex: dimension. Conversely, if we take the (possibly involving @dimk)
./mpi.texi:477: superfluous argument to @EnD tex: data and transform it with the (possibly involving @ndimstrans)
./mpi.texi:479: warning: @ifinfo should only appear at a line beginning (possibly involving @ndims)
./mpi.texi:479: superfluous argument to @EnD tex: array. (possibly involving @ndims)
./mpi.texi:483: warning: @ifinfo should only appear at a line beginning (possibly involving @ndimstrans)
./mpi.texi:483: superfluous argument to @EnD tex: (the (possibly involving @ndimstrans)
./mpi.texi:586: warning: @ifinfo should only appear at a line beginning (possibly involving @ndims)
./mpi.texi:586: superfluous argument to @EnD tex: real (possibly involving @ndims)
./mpi.texi:587: warning: @ifinfo should only appear at a line beginning (possibly involving @ndimshalf)
./mpi.texi:587: superfluous argument to @EnD tex: complex data: the last dimension of the (possibly involving @ndimshalf)
./mpi.texi:590: warning: @ifinfo should only appear at a line beginning (possibly involving @ndims)
./mpi.texi:590: superfluous argument to @EnD tex: dimensions of the real data. (possibly involving @ndims)
./mpi.texi:594: warning: @ifinfo should only appear at a line beginning (possibly involving @ndims)
./mpi.texi:594: superfluous argument to @EnD tex: , it is (possibly involving @ndims)
./mpi.texi:595: warning: @ifinfo should only appear at a line beginning (possibly involving @ndimspad)
./mpi.texi:595: superfluous argument to @EnD tex: array, where the last (possibly involving @ndimspad)
./mpi.texi:608: warning: @ifinfo should only appear at a line beginning (possibly involving @ndimshalf)
./mpi.texi:616: warning: @ifinfo should only appear at a line beginning (possibly involving @threedims)
./mpi.texi:616: superfluous argument to @EnD tex: real data [padded to L x M x 2(N/2+1) (possibly involving @threedims)
./mpi.texi:616: superfluous argument to @EnD tex: ], (possibly involving @threedims)
./mpi.texi:617: warning: @ifinfo should only appear at a line beginning (possibly involving @threedims)
./mpi.texi:617: superfluous argument to @EnD tex: complex data. Similar to the (possibly involving @threedims)
./mpi.texi:665: superfluous argument to @EnD tex: in row-major order, so its (possibly involving @threedims)
./mpi.texi:674: warning: @ifinfo should only appear at a line beginning (possibly involving @threedims)
./mpi.texi:674: superfluous argument to @EnD tex: r2c (possibly involving @threedims)
./mpi.texi:676: warning: @ifinfo should only appear at a line beginning (possibly involving @threedims)
./mpi.texi:676: superfluous argument to @EnD tex: real array (possibly involving @threedims)
./mpi.texi:678: superfluous argument to @EnD tex: complex array distributed over the @code{M} (possibly involving @threedims)
./mpi.texi:699: warning: @ifinfo should only appear at a line beginning (possibly involving @twodims)
./mpi.texi:699: superfluous argument to @EnD tex: that is (possibly involving @twodims)
./mpi.texi:868: warning: @ifinfo should only appear at a line beginning (possibly involving @twodims)
./mpi.texi:868: superfluous argument to @EnD tex: transpose on @code{P} processes, (possibly involving @twodims)
./mpi.texi:1295: warning: @ifinfo should only appear at a line beginning (possibly involving @ndims)
./mpi.texi:1295: superfluous argument to @EnD tex: array that is stored on the local (possibly involving @ndims)
./mpi.texi:1299: warning: @ifinfo should only appear at a line beginning (possibly involving @ndimstrans)
./mpi.texi:1299: warning: @Xref should not appear in @EnD
./mpi.texi:1299: @Xref missing close brace
./mpi.texi:1299: superfluous argument to @EnD tex: transposed output. @Xref{Transposed
} (possibly involving @ndimstrans)
./mpi.texi:1300: misplaced }
./mpi.texi:1414: warning: @ifinfo should only appear at a line beginning (possibly involving @ndims)
./mpi.texi:1414: superfluous argument to @EnD tex: input data and the first dimension (possibly involving @ndims)
./mpi.texi:1415: warning: @ifinfo should only appear at a line beginning (possibly involving @ndimstrans)
./mpi.texi:1415: superfluous argument to @EnD tex: transposed data (at intermediate (possibly involving @ndimstrans)
./mpi.texi:1451: warning: @ifinfo should only appear at a line beginning (possibly involving @ndims)
./mpi.texi:1451: superfluous argument to @EnD tex: transform is (possibly involving @ndims)
./mpi.texi:1452: warning: @ifinfo should only appear at a line beginning (possibly involving @ndimstrans)
./mpi.texi:1452: warning: @Xref should not appear in @EnD
./mpi.texi:1452: superfluous argument to @EnD tex: . @Xref{Transposed distributions}. (possibly involving @ndimstrans)
./mpi.texi:1498: warning: @ifinfo should only appear at a line beginning (possibly involving @ndims)
./mpi.texi:1498: warning: @ifinfo should only appear at a line beginning (possibly involving @ndimshalf)
./mpi.texi:1498: superfluous argument to @EnD tex: real data to/from n[0] x n[1] x n[2] x ... x (n[d-1]/2 + 1) (possibly involving @ndims)
./mpi.texi:1498: superfluous argument to @EnD tex: complex (possibly involving @ndimshalf)
./mpi.texi:1501: warning: @ifinfo should only appear at a line beginning (possibly involving @ndimspad)
./mpi.texi:1699: superfluous argument to @EnD tex: complex DFT in-place. (This assumes you have already (possibly involving @twodims)
./mpi.texi:1738: warning: @ifinfo should only appear at a line beginning (possibly involving @twodims)
./mpi.texi:1738: superfluous argument to @EnD tex: Fortran array is viewed by FFTW in C as a (possibly involving @twodims)
./mpi.texi:1739: superfluous argument to @EnD tex: array. This means that the array was distributed over (possibly involving @twodims)
./mpi.texi:1741: superfluous argument to @EnD tex: array in Fortran. (You must @Emph{not} use an (possibly involving @twodims)
./mpi.texi:1742: warning: @ifinfo should only appear at a line beginning (possibly involving @twodims)
./mpi.texi:1742: superfluous argument to @EnD tex: array, (possibly involving @twodims)
./mpi.texi:1752: warning: @ifinfo should only appear at a line beginning (possibly involving @twodims)
./mpi.texi:1752: superfluous argument to @EnD tex: array, associated with the @Emph{same} (possibly involving @twodims)
./modern-fortran.texi:177: warning: @ifinfo should only appear at a line beginning (possibly involving @threedims)
./modern-fortran.texi:177: superfluous argument to @EnD tex: ) arrays: (possibly involving @threedims)
./modern-fortran.texi:190: warning: @ifinfo should only appear at a line beginning (possibly involving @threedims)
./modern-fortran.texi:190: superfluous argument to @EnD tex: array. (possibly involving @threedims)
./modern-fortran.texi:209: warning: @ifinfo should only appear at a line beginning (possibly involving @threedims)
./modern-fortran.texi:209: superfluous argument to @EnD tex: real input (possibly involving @threedims)
./modern-fortran.texi:210: warning: @ifinfo should only appear at a line beginning (possibly involving @threedims)
./modern-fortran.texi:210: superfluous argument to @EnD tex: complex output). In Fortran, because (possibly involving @threedims)
./modern-fortran.texi:213: warning: @ifinfo should only appear at a line beginning (possibly involving @threedims)
./modern-fortran.texi:213: superfluous argument to @EnD tex: real input in Fortran: (possibly involving @threedims)
./modern-fortran.texi:232: warning: @ifinfo should only appear at a line beginning (possibly involving @threedims)
./modern-fortran.texi:232: superfluous argument to @EnD tex: array, even though only (possibly involving @threedims)
./modern-fortran.texi:233: superfluous argument to @EnD tex: of it is actually used. In this example, we will (possibly involving @threedims)
./modern-fortran.texi:471: warning: @ifinfo should only appear at a line beginning (possibly involving @threedims)
./modern-fortran.texi:471: superfluous argument to @EnD tex: array. (possibly involving @threedims)
./modern-fortran.texi:484: warning: @ifinfo should only appear at a line beginning (possibly involving @threedims)
./modern-fortran.texi:484: superfluous argument to @EnD tex: array. (Alternatively, you can (possibly involving @threedims)
./modern-fortran.texi:495: warning: @ifinfo should only appear at a line beginning (possibly involving @twodims)
./modern-fortran.texi:495: superfluous argument to @EnD tex: 2d real array: (possibly involving @twodims)
./modern-fortran.texi:506: warning: @ifinfo should only appear at a line beginning (possibly involving @threedims)
./modern-fortran.texi:506: superfluous argument to @EnD tex: 3d complex array: (possibly involving @threedims)
make[3]: *** [fftw3.info] Error 1
make[3]: Leaving directory /usr/src/fftw-3.3.3/fftw-single-3.3.3/doc' make[2]: *** [all-recursive] Error 1 make[2]: Leaving directory/usr/src/fftw-3.3.3/fftw-single-3.3.3/doc'
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory `/usr/src/fftw-3.3.3/fftw-single-3.3.3'
make: *** [all] Error 2

error when compiling with --enable-avx

I have a problem when I compile the current version (cd2b27d) with --enable-avx (./configure --enable-shared --enable-sse2 --enable-avx --enable-avx2 --enable-fma --enable-maintainer-mode):

In file included from ../../../dft/simd/n1f.h:21:0,
from ../common/n1fv_3.c:35,
from n1fv_3.c:3:
../../../simd-support/simd-avx.h:254:27: error: incompatible type for argument 2 of 'ST'
#define VFMAI(b, c) SUFF(_mm_addsub256_p)(c,FLIP_RI(b))

fftw-wisdom-to-conf manpage has wrong example

the manpage gives this example:

   fftw-wisdom -n cof1024 cob1024 -o wisdom

which will fail with;

bench: problem.c:96: assertion failed: isdigit(*s)

correct seems to be:

fftw-wisdom -n -o wisdom cof1024 cob1024

then the example works (if one applies f230f8c)

Crash on ARM

This FFTW test crashes on ARM. Running on a chromebook, with an ubuntu 12.04 chroot. GCC version is 4.8.2.

make[3]: Entering directory /home/viral/julia/deps/fftw-3.3.4-single/tests' perl -w ./check.pl -r -c=30 -vpwd`/bench
Executing "/home/viral/julia/deps/fftw-3.3.4-single/tests/bench --verbose=1 --verify 'ok88o01_152' --verify 'ik88o01_152' --verify 'obr5x14x2x11_5' --verify 'ibr5x14x2x11_5' --verify 'ofr5x14x2x11_5' --verify 'ifr5x14x2x11_5' --verify '//obc5x14x2x11_5' --verify '//ibc5x14x2x11_5' --verify '//ofc5x14x2x11_5' --verify '//ifc5x14x2x11_5' --verify 'obc5x14x2x11_5' --verify 'ibc5x14x2x11_5' --verify 'ofc5x14x2x11_5' --verify 'ifc5x14x2x11_5' --verify 'ok8e01_29' --verify 'ik8e01_29' --verify 'obr4x2v15' --verify 'ibr4x2v15' --verify 'ofr4x2v15' --verify 'ifr4x2v15' --verify '//obc4x2v15' --verify '//ibc4x2v15' --verify '//ofc4x2v15' --verify '//ifc4x2v15' --verify 'obc4x2v15' --verify 'ibc4x2v15' --verify 'ofc4x2v15' --verify 'ifc4x2v15' --verify 'ok8e10x7bx11e00x10e10' --verify 'ik8e10x7bx11e00x10e10' --verify '//obr40x11' --verify '//ofr40x11' --verify 'obr40x11' --verify 'ibr40x11' --verify 'ofr40x11' --verify 'ifr40x11' --verify '//obc40x11' --verify '//ibc40x11' --verify '//ofc40x11' --verify '//ifc40x11' --verify 'obc40x11' --verify 'ibc40x11' --verify 'ofc40x11' --verify 'ifc40x11'"
Segmentation fault (core dumped)

No rule to make target `n1_2.c', needed by `all'. Stop.

Ubuntu.

v3.3.4 release.

compile error

I compiled exemple fortran file from internet…
And, this is error report….

Undefined symbols for architecture x86_64:
"dfftw_destroy_plan", referenced from:
MAIN_ in ccgelF5d.o
"dfftw_execute", referenced from:
MAIN_ in ccgelF5d.o
"dfftw_plan_dft_2d", referenced from:
MAIN_ in ccgelF5d.o
ld: symbol(s) not found for architecture x86_64
collect2: error: ld returned 1 exit status
[Finished in 0.0s with exit code 1]
[cmd: ['/usr/local/bin/gfortran', '/Users/hyuna917/Dropbox/Kono Lab/Computer/数値計算演習/fortran/fftw.f', '-o', '/Users/hyuna917/Dropbox/Kono Lab/Computer/数値計算演習/fortran/fftw', '-I/usr/local/include', '-L/usr/local/lib', '-lfftw3', '-lm']]
[dir: /Users/hyuna917/Dropbox/Kono Lab/Computer/数値計算演習/fortran]
[path: /usr/bin:/bin:/usr/sbin:/sbin]

I compiled ‘gfortran fftw.f -o fftw -I/usr/local/include -L/usr/local/lib -lfftw3 -lm
Why occurred this error?

Configure script in 3.3.4 release selects incorrect link flags with MACOSX_DEPLOYMENT_TARGET=10.10

When the 3.3.4 configure script is executed on OS X with MACOSX_DEPLOYMENT_TARGET=10.10 in its environment, it incorrectly selects -flat_namespace -undefined suppress for linking, instead of -undefined dynamic_lookup. This is due to a bug in Libtool 2.4.2 and earlier.

We've patched the script in MacPorts; the fix itself is trivial. The permanent solution is to regenerate the script using the just-released Libtool 2.4.3.

Bit reproducibility for FFTW3?

What is the current level of support for bit reproducibility in FFTW3?

test issue

This is a test issue.

segmentation fault with pgi 16.1

Hi -
I need to use fftw 3.3.4 with the pgi compiler 16.1. fftw does build with the following flags"

$ CC=pgcc CFLAGS="-O2 -fPIC" F77=pgfortran FFLAGS="-O2"  ./configure  --enable-avx --enable-openmp --enable-shared  --prefix=/home/steinba/software/fftw/3.3.4/pgi161-nompi

fftw builds, the checks fail and I see that the bench util runs into a segmentation fault when called like

$ ./bench -s 64
Segmentation fault (core dumped)

The code can be compiled alright with pgi 15.9 and the bench utility runs just fine.
Any idea?

Thanks,
P

$ lsb_release -a
LSB Version:    :base-4.0-amd64:base-4.0-noarch:core-4.0-amd64:core-4.0-noarch:graphics-4.0-amd64:graphics-4.0-noarch:printing-4.0-amd64:printing-4.0-noarch
Distributor ID: RedHatEnterpriseServer
Description:    Red Hat Enterprise Linux Server release 6.4 (Santiago)
Release:        6.4
Codename:       Santiago
$ uname -a
Linux tauruslogin4 2.6.32-504.3.3.el6.x86_64 #1 SMP Fri Dec 12 16:05:43 EST 2014 x86_64 x86_64 x86_64 GNU/Linux

Please clarify README for building from git.

The instructions for building from git are ambiguous, resulting in people reproducing #9.

Please change the wording to something like:

"If you are using the git repository, install ocaml, autoconf,
automake, indent and libtool, and execute the bootstrap.sh script instead of running configure directly."

make error on 3.5.0-42-generic #65~precise1-Ubuntu x86_64

make[4]: Entering directory /home/xxx/src/fftw3/dft/scalar/codelets' make[4]: *** No rule to make targetn1_3.c', needed by `all'. Stop.

I am using the git repository.

relocation R_X86_64_32 against;recompile with -fPIC

Kali 2.0 64 / gcc-4.8.2

/usr/bin/ld: /usr/local/lib/libfftw3f.a(mapflags.o): relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC
/usr/local/lib/libfftw3f.a: error adding symbols: Bad value
collect2: error: ld returned 1 exit status
Makefile:396: recipe for target 'libosmodsp.la' failed
make[2]: *** [libosmodsp.la] Error 1
make[2]: Leaving directory '/root/libosmo-dsp/src'
Makefile:475: recipe for target 'all-recursive' failed
make[1]: *** [all-recursive] Error 1
make[1]: Leaving directory '/root/libosmo-dsp'
Makefile:360: recipe for target 'all' failed
make: *** [all] Error 2

gcc-4.8.0 miscompiles fftw --enable-long-double on amd64

It appears that gcc-4.8.0 generates incorrect code when compiling file dft/scalar/codelets/q1_6.c on amd64 in long-double precision. gcc-4.8.1 and newer generate correct code.

To reproduce the bug:

./configure --enable-long-double CC=/tmp/local/bin/x86_64-unknown-linux-gnu-gcc-4.8.0
make
tests/bench -oparanoid --verify 'i72'

If the output starts with "Found relative error 3.333333e-01 (impulse 1)" then you encountered the bug; otherwise you are ok.

This is not a fftw bug, but a gcc bug that was fixed a long time ago, so I will close this issue. The solution for gcc-4.8.0 users (if any) is to upgrade to 4.8.1 or later.

dft/simd/sse2 is compiled even when --disable-sse2 was given to configure

The build -- both in 3.3.3 and 3.3.4 -- attempts to compile the content of the sse2-directory even when configure was explicitly asked to disable sse2:

--enable-shared --enable-threads --disable-fortran --disable-openmp --enable-float --enable-sse --disable-sse2

When the -march argument is set to a CPU, that has no SSE2 instructions (such as "athlon-xp"), some compilers -- such as clang -- fail:

cc -DHAVE_CONFIG_H -I. -I../../.. -I../../../kernel -I../../../dft -I../../../dft/simd -I../../../simd-support -msse -O2 -pipe -march=athlon-xp -fstack-protector -fno-strict-aliasing -MT n1fv_2.lo -MD -MP -MF .deps/n1fv_2.Tpo -c n1fv_2.c -fPIC -DPIC -o .libs/n1fv_2.o
fatal error: error in backend: Do not know how to split the result of this operator!
cc: error: clang frontend command failed with exit code 70 (use -v to see invocation)
FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512
Target: i386-unknown-freebsd10.1

Looking into configure script I see the following line:

if test "$have_sse" = "yes"; then have_sse2=yes; fi

Huh?

IBM POWER 8 arch support

Hello!

There has been news that IBM would (help) optimize FFTW3 to their new H/W architecture, how is it going? Would the altivec code for older POWER H/W work in this new architecture ?

TIA,
Fabricio

fftw-wisdom manpage needs a update (patch attached)

My distribution recently ran manpage-versus-help tool over all the packages. fftw came almost clean:

"""
In man of fftw-wisdom and fftw{f,l,q}-wisdom is missing (compared to help):
-T --threads
"""

N and n, what number to use?

Hi guys, there's some info about plans here:
http://www.fftw.org/fftw3_doc/Real_002dto_002dReal-Transform-Kinds.html#Real_002dto_002dReal-Transform-Kinds

the point is, there's a function to get the N and the n value? could we create it?

Segmentation fault in distributed dft depends on number of processes

Hello all!

I use MPI FFTW3 (3.3.4) in my project and found that for some number of processes it fails.

After some study I get the following testing code::

#include <mpi.h>
#include <fftw3-mpi.h>
int main(int argc, char **argv){
    const ptrdiff_t N[2]={448,352};
    ptrdiff_t Nz=354;
    fftw_plan plan;
    fftw_complex *data;
        ptrdiff_t alloc_local, local_n0, local_0_start, block;
    int me,np,err;
        MPI_Init(&argc, &argv);
        fftw_mpi_init();
    MPI_Comm_rank(MPI_COMM_WORLD, &me);
    MPI_Comm_size(MPI_COMM_WORLD,&np);
    block=FFTW_MPI_DEFAULT_BLOCK;
        alloc_local = fftw_mpi_local_size_many(2,N,Nz,block ,MPI_COMM_WORLD,&local_n0, &local_0_start);
    data = fftw_alloc_complex(alloc_local);
    if (me==0){
        printf("Test is running at %d processes \n",np);
        printf("Before plan calculation\n" );
    }
    plan = fftw_mpi_plan_many_dft(2,N,Nz,block,block, data, data, MPI_COMM_WORLD, FFTW_FORWARD, FFTW_PATIENT);
    if (me==0){
        printf("After plan calculation\n" );
    }
    MPI_Finalize();
}

This is 2D Fourier transform over the overlayed data. The transform dimensions are 448 to 352 and howmany parameter is 354.

This code works fine if it is run at 2,4,7,8,14,28,32,64 and 448 processes.
But for 56,112 and 224 processes I get segmentation failed during the plan calculation.

I checked it at following configurations:

x86_64 with different versions of openmpi: 1.5.5, 1.6.5 and 1.8.4.
gcc 4.4.7
icc 13.1.0
Bluegene/p
bgxlc_r 9.0

The code behavior is the same.

Overlapped MPI_Alltoall(v) buffers

in mpi/transpose_alltoall.c, line 65 and 78, could it happen that I and O are the same buffer? If that happens, then on MPICH2 (eg. XC30) the code would trigger an error that requires setting MPICH_NO_BUFFER_ALIAS_CHECK

Disable SIMD at runtime?

Is there a way to disable, e.g., AVX, during plan creation, i.e., have the planner ignore specific SIMD codelets during plan generation?

fftw-3.3.3 has the wrong shared-library revision number

FFTW uses the normal libtool mechanism for versioning shared libraries. Unfortunately, we forgot to update the libtool version string when shipping fftw-3.3.3, and consequently the 3.3.3 shared library has the same version number as fftw-3.3.2. This may cause difficulties to people upgrading FFTW while the FFTW shared library is in use, and may cause confusion about which version is installed.

The purpose of this "issue" is to document the problem. No fix is planned for 3.3.3. Future releases will (hopefully) have the correct version number.

is mpi wisdom cumulative?

I'm trying to save MPI wisdom, and I'm getting strange behavior.
Since my understanding is that fftw wisdom accumulates, I figured the easiest thing I can do is create an empty file, then read wisdom from there, run my code, save wisdom in the file, and then the next time I would run my code, it would no longer spend any time with plans.
However, that's not what happens.

In my tests I am using an MPI code on my 8 core machine, and I am doing an inverse FFT for 3 interleaved 256x256x256 arrays.
After the first run with "FFTW_MEASURE" the wisdom file is big (64 lines), and it takes ~100 seconds for this first run.
Second run takes 5 seconds, and the wisdom file is smaller (18 lines).
Third run takes ~100 seconds, and the wisdom file remains the same (I guess, I didn't check all the codes individually).
All subsequent runs are of ~100 seconds.

If I change my code so that it no longer overwrites the wisdom file after the first run, all subsequent runs are fast, so I can work around this issue.
Also, for this particular size of the transform, FFTW_ESTIMATE seems to be just as fast as FFTW_MEASURE (if I don't overwrite the wisdom file), so it's not a big deal anyway.
However, when I'll be running my production jobs, the transforms will be a lot bigger, and I'd like to understand what's happening.

By the way, I'm using the recommended way of saving the files from http://www.fftw.org/doc/FFTW-MPI-Wisdom.html (more on that in the next issue though).

GCC 4.7 produces bad code on Windows 32-bit

I get sigsegv when using fftw compiled using

g++ (tdm-1) 4.7.1
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR  PURPOSE.

on 32-bit windows.

Is this related to these release notes?

Removed an archaic stack-alignment hack that was failing with gcc-4.7/i386. Added stack-alignment hack necessary for gcc on Windows/i386. We will regret this in ten years (see previous change).

How should the library be compiled on this platform?

documentation info

Hi guy, we are creating a php binding of fftw3 ( https://github.com/bukka/php-fftw ), and i have some doubts about constant meaning:

define FFTW_NO_TIMELIMIT (-1.0) -> must find where use it, but probably "fftw_set_timelimit" function, that's right?

and these constants too, i don't know what it is, i think these are plan flags, but not sure what each one do:

define FFTW_CONSERVE_MEMORY (1U << 2)

/* undocumented beyond-guru flags /

define FFTW_ESTIMATE_PATIENT (1U << 7)

define FFTW_BELIEVE_PCOST (1U << 8)

define FFTW_NO_DFT_R2HC (1U << 9)

define FFTW_NO_NONTHREADED (1U << 10)

define FFTW_NO_BUFFERING (1U << 11)

define FFTW_NO_INDIRECT_OP (1U << 12)

define FFTW_ALLOW_LARGE_GENERIC (1U << 13) / NO_LARGE_GENERIC is default */

define FFTW_NO_RANK_SPLITS (1U << 14)

define FFTW_NO_VRANK_SPLITS (1U << 15)

define FFTW_NO_VRECURSE (1U << 16)

define FFTW_NO_SIMD (1U << 17)

define FFTW_NO_SLOW (1U << 18)

define FFTW_NO_FIXED_RADIX_LARGE_N (1U << 19)

define FFTW_ALLOW_PRUNING (1U << 20)

support AArch64 Neon SIMD

GCC does not support -mfpu flag for AArch64.

How to use functions provided by fftw3 correctly?

Hi,
I want to use the fftw3 library in C++ project, so I call some functions such as "fftwf_plan_dft_r2c","fftwf_plan_dft_c2r", but why errors like " undefined reference to `fftwf_plan_dft_r2c" occurs?

Bug in MPI planner for Nx1 transforms

[Bug report from Damon Farnsworth from Cray:]

While running the mpi checks on the stock fftw 3.3.4 (I also tested against version 3.3 and saw the same failure) with
your bench tester I came across a failure (relative error) for certain problems. It seems to be isolated (as far as
I’ve seen) to complex problems (single or double precision; inplace or out-of-place; forward or backward) of sizes Nx1
where N is roughly 25 or greater, although some values of N will pass. This seems to only happen when the number of
ranks is greater than one. I give a couple of examples below.

Here’s a successful test, one mpi rank:

aprun -n1 ./mpi-bench.double.static.exe -v2 --verify obc98x1

planner time: 0.005592 s

(mpi-dft-serial

(dft-ct-dit/7

(dftw-direct-7/24 "t1bv_7_avx")

(dft-direct-14-x7 "n1bv_14_avx")))

flops: 318 add, 162 mul, 156 fma

estimated cost: 792.000000, pcost = 0.000000

obc98x1 4.05599e-16 5.38317e-16 9.21329e-16

Here’s the same test but with two mpi ranks:

aprun -n2 ./mpi-bench.double.static.exe -v2 --verify obc98x1

planner time: 0.013054 s

(mpi-dft-rank1/2/last

(mpi-dft-rank1-bigvec/contig

(mpi-transpose-pairwise

  (null)

  (rdft-transpose-cut-2x25-x2

    (rdft-vrank>=1-x2/1

      (rdft-rank0-iter-ci/2-x23))

    (rdft-rank0-ip-sq/2-x2-x2))

  (null)

  (null))

(dft-direct-2-x25 "n2bv_2_sse2")

(mpi-transpose-pairwise

  (rdft-transpose-cut-25x2-x2

    (rdft-rank0-ip-sq/2-x2-x2)

    (rdft-rank0-iter-co/2-x23-x2))

  (rdft-nop)

  (rdft-nop)

  (null)))

(dft-ct-dit/7

(dftw-direct-7/12 "t1buv_7_sse2")

(dft-indirect-before

  (dft-direct-7-x7 "n1bv_7_avx")

  (dft-r2hc-1

    (rdft-rank0-ip-sq/2-x7-x7))))

(mpi-transpose-pairwise

(rdft-rank0-iter-co/2-x25-x2)

(rdft-nop)

(rdft-nop)

(null)))

flops: 430 add, 232 mul, 192 fma

estimated cost: 1933.283180, pcost = 8989.000000

Found relative error 1.030928e-02 (impulse 1)

Found relative error 1.030928e-02 (impulse)

Found relative error 1.261234e-01 (time shift)

Found relative error 1.225574e-01 (time shift)

Found relative error 1.386730e-01 (time shift)

Found relative error 1.259797e-01 (time shift)

Found relative error 9.564516e-02 (time shift)

Found relative error 1.640549e-01 (time shift)

Found relative error 1.029969e-01 (time shift)

Found relative error 1.051039e-01 (time shift)

Found relative error 1.261746e-01 (time shift)

Found relative error 1.214751e-01 (time shift)

Found relative error 4.337277e-02 (freq shift)

Found relative error 4.390274e-02 (freq shift)

Found relative error 4.652282e-02 (freq shift)

Found relative error 5.435396e-02 (freq shift)

Found relative error 4.417120e-02 (freq shift)

Found relative error 4.410327e-02 (freq shift)

Found relative error 4.349692e-02 (freq shift)

Found relative error 4.195425e-02 (freq shift)

Found relative error 4.465815e-02 (freq shift)

Found relative error 3.965394e-02 (freq shift)

obc98x1 3.2374e-16 0.0103093 0.164055

possible bug in fftwf_mpi_gather_wisdom()

I've just submitted issue #33, and I have a related bug report.
In short, I'm trying to save mpi wisdom, and I call fftw_mpi_gather_wisdom(), as per http://www.fftw.org/doc/FFTW-MPI-Wisdom.html.
However, I'm only using single precision in my code, and every other FFTW function I call is prefixed by fftwf. For mpi_gather_wisdom, if I try to prefix it with fftwf, I get an invalid communicator error.
Any ideas?

Have you thought about accelerating fftw with OpenCL?

Issue/bug with 1D FFT/IFFT along third dimension of 3D matrix when FFTW_MEASURE or FFTW_PATIENT are used (FFTW_ESTIMATE works)

I have a reproducible issue with performing a 1D FFT/IFFT along the outer dimension of a 3D matrix when FFTW_MEASURE or FFTW_PATIENT are used. FFTW_ESTIMATE works fine and produces the correct results. Transforms along the other two dimensions (inner and middle) also work fine, regardless which of the three flags are used.

I have provided sample test code to reproduce the issue in this repo. The transform in question is on line 106 of the fftw_code.c file.

There is also an IPython notebook which verifies the output data. Please let me know if I can clarify anything or provide more information.

fftw / fftw3 Goto Github PK

fftw3's People

Contributors

Stargazers

Watchers

Forkers

fftw3's Issues

include

include

include "fftw3.h"

Problem Description

Discussion

if (GNUC > 4 || (GNUC == 4 && GNUC_MINOR >= 6)) \

define FFTW_NO_TIMELIMIT (-1.0) -> must find where use it, but probably "fftw_set_timelimit" function, that's right?

define FFTW_CONSERVE_MEMORY (1U << 2)

define FFTW_ESTIMATE_PATIENT (1U << 7)

define FFTW_BELIEVE_PCOST (1U << 8)

define FFTW_NO_DFT_R2HC (1U << 9)

define FFTW_NO_NONTHREADED (1U << 10)

define FFTW_NO_BUFFERING (1U << 11)

define FFTW_NO_INDIRECT_OP (1U << 12)

define FFTW_ALLOW_LARGE_GENERIC (1U << 13) / NO_LARGE_GENERIC is default */

define FFTW_NO_RANK_SPLITS (1U << 14)

define FFTW_NO_VRANK_SPLITS (1U << 15)

define FFTW_NO_VRECURSE (1U << 16)

define FFTW_NO_SIMD (1U << 17)

define FFTW_NO_SLOW (1U << 18)

define FFTW_NO_FIXED_RADIX_LARGE_N (1U << 19)

define FFTW_ALLOW_PRUNING (1U << 20)

Recommend Projects

Recommend Topics

Recommend Org