Giter VIP home page Giter VIP logo

spfft's Introduction

CI conda-forge Documentation License

SpFFT

SpFFT - A 3D FFT library for sparse frequency domain data written in C++ with support for MPI, OpenMP, CUDA and ROCm.

Inspired by the need of some computational material science applications with spherical cutoff data in frequency domain, SpFFT provides Fast Fourier Transformations of sparse frequency domain data. For distributed computations with MPI, slab decomposition in space domain and pencil decomposition in frequency domain (sparse data within a pencil / column must be on one rank) is used.

Fig. 1: Illustration of a transform, where data on each MPI rank is identified by color.

Design Goals

  • Sparse frequency domain input
  • Reuse of pre-allocated memory
  • Support for shifted indexing with centered zero-frequency
  • Optional parallelization and GPU acceleration
  • Unified interface for calculations on CPUs and GPUs
  • Support of Complex-To-Real and Real-To-Complex transforms, where the full hermitian symmetry property is utilized
  • C++, C and Fortran interfaces

Interface Design

To allow for pre-allocation and reuse of memory, the design is based on two classes:

  • Grid: Provides memory for transforms up to a given size.
  • Transform: Created with information on sparse input data and is associated with a Grid. Maximum size is limited by Grid dimensions. Internal reference counting to Grid objects guarantee a valid state until Transform object destruction.

A transform can be computed in-place and out-of-place. Addtionally, an internally allocated work buffer can optionally be used for input / output of space domain data.

New Features in v1.0

  • Support for externally allocated memory for space domain data including in-place and out-of-place transforms
  • Optional asynchronous computation when using GPUs
  • Simplified / direct transform handle creation if no resource reuse through grid handles is required

Documentation

Documentation can be found here.

Requirements

  • C++ Compiler with C++17 support. Supported compilers are:
    • GCC 7 and later
    • Clang 5 and later
    • ICC 19.0 and later
  • CMake 3.18 and later (3.21 for ROCm)
  • Library providing a FFTW 3.x interface (FFTW3 or Intel MKL)
  • For multi-threading: OpenMP support by the compiler
  • For compilation with GPU support:
    • CUDA 11.0 and later for Nvidia hardware
    • ROCm 5.0 and later for AMD hardware

Installation

The build system follows the standard CMake workflow. Example:

mkdir build
cd build
cmake .. -DSPFFT_OMP=ON -DSPFFT_MPI=ON -DSPFFT_GPU_BACKEND=CUDA -DSPFFT_SINGLE_PRECISION=OFF -DCMAKE_INSTALL_PREFIX=/usr/local
make -j8 install

CMake options

Option Default Description
SPFFT_MPI ON Enable MPI support
SPFFT_OMP ON Enable multi-threading with OpenMP
SPFFT_GPU_BACKEND OFF Select GPU backend. Can be OFF, CUDA or ROCM
SPFFT_GPU_DIRECT OFF Use GPU aware MPI with GPUDirect
SPFFT_SINGLE_PRECISION OFF Enable single precision support
SPFFT_STATIC OFF Build as static library
SPFFT_FFTW_LIB AUTO Library providing a FFTW interface. Can be AUTO, MKL or FFTW
SPFFT_BUILD_TESTS OFF Build test executables for developement purposes
SPFFT_INSTALL ON Add library to install target
SPFFT_FORTRAN OFF Build Fortran interface module
SPFFT_BUNDLED_LIBS ON Download required libraries for building tests

NOTE: When compiling with CUDA or ROCM (HIP), the standard CMAKE_CUDA_ARCHITECTURES and CMAKE_HIP_ARCHITECTURES options should be defined as well. HIP_HCC_FLAGS is no longer in use.

Examples

Further exmples for C++, C and Fortran can be found in the "examples" folder.

#include <complex>
#include <iostream>
#include <vector>

#include "spfft/spfft.hpp"

int main(int argc, char** argv) {
  const int dimX = 2;
  const int dimY = 2;
  const int dimZ = 2;

  std::cout << "Dimensions: x = " << dimX << ", y = " << dimY << ", z = " << dimZ << std::endl
            << std::endl;

  // Use default OpenMP value
  const int numThreads = -1;

  // Use all elements in this example.
  const int numFrequencyElements = dimX * dimY * dimZ;

  // Slice length in space domain. Equivalent to dimZ for non-distributed case.
  const int localZLength = dimZ;

  // Interleaved complex numbers
  std::vector<double> frequencyElements;
  frequencyElements.reserve(2 * numFrequencyElements);

  // Indices of frequency elements
  std::vector<int> indices;
  indices.reserve(dimX * dimY * dimZ * 3);

  // Initialize frequency domain values and indices
  double initValue = 0.0;
  for (int xIndex = 0; xIndex < dimX; ++xIndex) {
    for (int yIndex = 0; yIndex < dimY; ++yIndex) {
      for (int zIndex = 0; zIndex < dimZ; ++zIndex) {
        // init with interleaved complex numbers
        frequencyElements.emplace_back(initValue);
        frequencyElements.emplace_back(-initValue);

        // add index triplet for value
        indices.emplace_back(xIndex);
        indices.emplace_back(yIndex);
        indices.emplace_back(zIndex);

        initValue += 1.0;
      }
    }
  }

  std::cout << "Input:" << std::endl;
  for (int i = 0; i < numFrequencyElements; ++i) {
    std::cout << frequencyElements[2 * i] << ", " << frequencyElements[2 * i + 1] << std::endl;
  }

  // Create local Grid. For distributed computations, a MPI Communicator has to be provided
  spfft::Grid grid(dimX, dimY, dimZ, dimX * dimY, SPFFT_PU_HOST, numThreads);

  // Create transform.
  // Note: A transform handle can be created without a grid if no resource sharing is desired.
  spfft::Transform transform =
      grid.create_transform(SPFFT_PU_HOST, SPFFT_TRANS_C2C, dimX, dimY, dimZ, localZLength,
                            numFrequencyElements, SPFFT_INDEX_TRIPLETS, indices.data());


  ///////////////////////////////////////////////////
  // Option A: Reuse internal buffer for space domain
  ///////////////////////////////////////////////////

  // Transform backward
  transform.backward(frequencyElements.data(), SPFFT_PU_HOST);

  // Get pointer to buffer with space domain data. Is guaranteed to be castable to a valid
  // std::complex pointer. Using the internal working buffer as input / output can help reduce
  // memory usage.
  double* spaceDomainPtr = transform.space_domain_data(SPFFT_PU_HOST);

  std::cout << std::endl << "After backward transform:" << std::endl;
  for (int i = 0; i < transform.local_slice_size(); ++i) {
    std::cout << spaceDomainPtr[2 * i] << ", " << spaceDomainPtr[2 * i + 1] << std::endl;
  }

  /////////////////////////////////////////////////
  // Option B: Use external buffer for space domain
  /////////////////////////////////////////////////

  std::vector<double> spaceDomainVec(2 * transform.local_slice_size());

  // Transform backward
  transform.backward(frequencyElements.data(), spaceDomainVec.data());

  // Transform forward
  transform.forward(spaceDomainVec.data(), frequencyElements.data(), SPFFT_NO_SCALING);

  // Note: In-place transforms are also supported by passing the same pointer for input and output.

  std::cout << std::endl << "After forward transform (without normalization):" << std::endl;
  for (int i = 0; i < numFrequencyElements; ++i) {
    std::cout << frequencyElements[2 * i] << ", " << frequencyElements[2 * i + 1] << std::endl;
  }

  return 0;
}

Acknowledgements

This work was supported by:

ethz Swiss Federal Institute of Technology in Zurich
cscs Swiss National Supercomputing Centre
max MAterials design at the eXascale
(Horizon2020, grant agreement MaX CoE, No. 824143)

spfft's People

Contributors

adhocman avatar haampie avatar ltalirz avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

spfft's Issues

Building SpFFT with Intel 18.0.5 fails

While newer Intel compiler versions (19.1.x and 2021.x) are building SpFFT without problem, the older version
icpc version 18.0.5 (gcc version 9.3.0 compatibility)
returns the following error:

cd /data/user/krack/software/SpFFT-1.0.3/build-cpu/src && /opt/psi/Programming/intel/18.4/compilers_and_libraries_2018.5.274/linux/mpi/intel64/bin/mpiicpc  -I/data/user/krack/software/SpFFT-1.0.3/ext -I/data/user/krack/software/SpFFT-1.0.3/include -I/data/user/krack/software/SpFFT-1.0.3/src -I/data/user/krack/software/SpFFT-1.0.3/build-cpu -isystem /opt/psi/Programming/intel/18.4/compilers_and_libraries_2018.5.274/linux/mkl/include -isystem /opt/psi/Programming/intel/18.4/compilers_and_libraries_2018.5.274/linux/mkl/include/fftw -O2 -fPIC -fopenmp -fp-model precise -funroll-loops -g -traceback -xHost -fvisibility=hidden -qopenmp -std=gnu++11 -MD -MT src/CMakeFiles/spfft.dir/execution/execution_host.cpp.o -MF CMakeFiles/spfft.dir/execution/execution_host.cpp.o.d -o CMakeFiles/spfft.dir/execution/execution_host.cpp.o -c /data/user/krack/software/SpFFT-1.0.3/src/execution/execution_host.cpp
/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/ext/new_allocator.h(146): error: function "std::pair<_T1, _T2>::pair(const std::pair<_T1, _T2> &) [with _T1=const std::tuple<bool, int, int>, _T2=spfft::FFTWPlan<double>]" (declared at line 303 of "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/stl_pair.h") cannot be referenced -- it is a deleted function
                            _Up(std::forward<_Args>(__args)...)))
                               ^
          detected during:
            instantiation of "void __gnu_cxx::new_allocator<_Tp>::construct(_Up *, _Args &&...) [with _Tp=std::__detail::_Hash_node<std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>, true>, _Up=std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>, _Args=<const std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>> &>]" at line 483 of "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/alloc_traits.h"
            instantiation of "void std::allocator_traits<std::allocator<_Tp>>::construct(std::allocator_traits<std::allocator<_Tp>>::allocator_type &, _Up *, _Args &&...) [with _Tp=std::__detail::_Hash_node<std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>, true>, _Up=std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>, _Args=<const std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>> &>]" at line 2088 of
                      "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/hashtable_policy.h"
            instantiation of "std::__detail::_Hashtable_alloc<_NodeAlloc>::__node_type *std::__detail::_Hashtable_alloc<_NodeAlloc>::_M_allocate_node(_Args &&...) [with _NodeAlloc=std::__alloc_rebind<std::allocator<std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>>, std::__detail::_Hash_node<std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>, true>>, _Args=<const std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>> &>]" at line 1243 of
                      "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/hashtable.h"
            instantiation of "std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::_Hashtable(const std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits> &) [with _Key=std::tuple<bool, int, int>, _Value=std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>, _Alloc=std::allocator<std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>>, _ExtractKey=std::__detail::_Select1st,
                      _Equal=std::equal_to<std::tuple<bool, int, int>>, _H1=spfft::FFTWPropHash, _H2=std::__detail::_Mod_range_hashing, _Hash=std::__detail::_Default_ranged_hash, _RehashPolicy=std::__detail::_Prime_rehash_policy, _Traits=std::__umap_traits<true>]" at line 75 of "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/stl_construct.h"
            instantiation of class "std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc> [with _Key=std::tuple<bool, int, int>, _Tp=spfft::FFTWPlan<double>, _Hash=spfft::FFTWPropHash, _Pred=std::equal_to<std::tuple<bool, int, int>>, _Alloc=std::allocator<std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>>]" at line 75 of "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/stl_construct.h"
            [ 8 instantiation contexts not shown ]
            instantiation of "_ForwardIterator std::__uninitialized_copy_a(_InputIterator, _InputIterator, _ForwardIterator, std::allocator<_Tp> &) [with _InputIterator=const std::tuple<spfft::FlexibleFFTWPlan<double>, spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}> *, _ForwardIterator=std::tuple<spfft::FlexibleFFTWPlan<double>, spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}> *, _Tp=std::tuple<spfft::FlexibleFFTWPlan<double>,
                      spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}>]" at line 1512 of "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/stl_vector.h"
            instantiation of "std::vector<_Tp, _Alloc>::pointer std::vector<_Tp, _Alloc>::_M_allocate_and_copy(std::vector<_Tp, _Alloc>::size_type={std::size_t={unsigned long}}, _ForwardIterator, _ForwardIterator) [with _Tp=std::tuple<spfft::FlexibleFFTWPlan<double>, spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}>, _Alloc=std::allocator<std::tuple<spfft::FlexibleFFTWPlan<double>, spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}>>,
                      _ForwardIterator=const std::tuple<spfft::FlexibleFFTWPlan<double>, spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}> *]" at line 87 of "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/vector.tcc"
            instantiation of "void std::vector<_Tp, _Alloc>::reserve(std::vector<_Tp, _Alloc>::size_type={std::size_t={unsigned long}}) [with _Tp=std::tuple<spfft::FlexibleFFTWPlan<double>, spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}>, _Alloc=std::allocator<std::tuple<spfft::FlexibleFFTWPlan<double>, spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}>>]" at line 105 of "/data/user/krack/software/SpFFT-1.0.3/src/fft/transform_1d_host.hpp"
            instantiation of "spfft::Transform1DPlanesHost<T>::Transform1DPlanesHost(spfft::HostArrayView3D<spfft::Transform1DPlanesHost<T>::ComplexType>, spfft::HostArrayView3D<spfft::Transform1DPlanesHost<T>::ComplexType>, bool, bool, int, int) [with T=double]" at line 77 of "/data/user/krack/software/SpFFT-1.0.3/src/execution/execution_host.cpp"
            instantiation of "spfft::ExecutionHost<T>::ExecutionHost(int, std::shared_ptr<spfft::Parameters>, spfft::HostArray<std::complex<T>> &, spfft::HostArray<std::complex<T>> &) [with T=double]" at line 360 of "/data/user/krack/software/SpFFT-1.0.3/src/execution/execution_host.cpp"

/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/ext/new_allocator.h(147): error: function "std::pair<_T1, _T2>::pair(const std::pair<_T1, _T2> &) [with _T1=const std::tuple<bool, int, int>, _T2=spfft::FFTWPlan<double>]" (declared at line 303 of "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/stl_pair.h") cannot be referenced -- it is a deleted function
        { ::new((void *)__p) _Up(std::forward<_Args>(__args)...); }
                                ^
          detected during:
            instantiation of "void __gnu_cxx::new_allocator<_Tp>::construct(_Up *, _Args &&...) [with _Tp=std::__detail::_Hash_node<std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>, true>, _Up=std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>, _Args=<const std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>> &>]" at line 484 of "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/alloc_traits.h"
            instantiation of "void std::allocator_traits<std::allocator<_Tp>>::construct(std::allocator_traits<std::allocator<_Tp>>::allocator_type &, _Up *, _Args &&...) [with _Tp=std::__detail::_Hash_node<std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>, true>, _Up=std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>, _Args=<const std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>> &>]" at line 2088 of
                      "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/hashtable_policy.h"
            instantiation of "std::__detail::_Hashtable_alloc<_NodeAlloc>::__node_type *std::__detail::_Hashtable_alloc<_NodeAlloc>::_M_allocate_node(_Args &&...) [with _NodeAlloc=std::__alloc_rebind<std::allocator<std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>>, std::__detail::_Hash_node<std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>, true>>, _Args=<const std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>> &>]" at line 1243 of
                      "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/hashtable.h"
            instantiation of "std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits>::_Hashtable(const std::_Hashtable<_Key, _Value, _Alloc, _ExtractKey, _Equal, _H1, _H2, _Hash, _RehashPolicy, _Traits> &) [with _Key=std::tuple<bool, int, int>, _Value=std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>, _Alloc=std::allocator<std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>>, _ExtractKey=std::__detail::_Select1st,
                      _Equal=std::equal_to<std::tuple<bool, int, int>>, _H1=spfft::FFTWPropHash, _H2=std::__detail::_Mod_range_hashing, _Hash=std::__detail::_Default_ranged_hash, _RehashPolicy=std::__detail::_Prime_rehash_policy, _Traits=std::__umap_traits<true>]" at line 75 of "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/stl_construct.h"
            instantiation of class "std::unordered_map<_Key, _Tp, _Hash, _Pred, _Alloc> [with _Key=std::tuple<bool, int, int>, _Tp=spfft::FFTWPlan<double>, _Hash=spfft::FFTWPropHash, _Pred=std::equal_to<std::tuple<bool, int, int>>, _Alloc=std::allocator<std::pair<const std::tuple<bool, int, int>, spfft::FFTWPlan<double>>>]" at line 75 of "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/stl_construct.h"
            [ 8 instantiation contexts not shown ]
            instantiation of "_ForwardIterator std::__uninitialized_copy_a(_InputIterator, _InputIterator, _ForwardIterator, std::allocator<_Tp> &) [with _InputIterator=const std::tuple<spfft::FlexibleFFTWPlan<double>, spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}> *, _ForwardIterator=std::tuple<spfft::FlexibleFFTWPlan<double>, spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}> *, _Tp=std::tuple<spfft::FlexibleFFTWPlan<double>,
                      spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}>]" at line 1512 of "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/stl_vector.h"
            instantiation of "std::vector<_Tp, _Alloc>::pointer std::vector<_Tp, _Alloc>::_M_allocate_and_copy(std::vector<_Tp, _Alloc>::size_type={std::size_t={unsigned long}}, _ForwardIterator, _ForwardIterator) [with _Tp=std::tuple<spfft::FlexibleFFTWPlan<double>, spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}>, _Alloc=std::allocator<std::tuple<spfft::FlexibleFFTWPlan<double>, spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}>>,
                      _ForwardIterator=const std::tuple<spfft::FlexibleFFTWPlan<double>, spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}> *]" at line 87 of "/afs/psi.ch/sys/psi.merlin/Programming/gcc/9.3.0/bin/../include/c++/9.3.0/bits/vector.tcc"
            instantiation of "void std::vector<_Tp, _Alloc>::reserve(std::vector<_Tp, _Alloc>::size_type={std::size_t={unsigned long}}) [with _Tp=std::tuple<spfft::FlexibleFFTWPlan<double>, spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}>, _Alloc=std::allocator<std::tuple<spfft::FlexibleFFTWPlan<double>, spfft::SizeType={unsigned long long}, spfft::SizeType={unsigned long long}>>]" at line 105 of "/data/user/krack/software/SpFFT-1.0.3/src/fft/transform_1d_host.hpp"
            instantiation of "spfft::Transform1DPlanesHost<T>::Transform1DPlanesHost(spfft::HostArrayView3D<spfft::Transform1DPlanesHost<T>::ComplexType>, spfft::HostArrayView3D<spfft::Transform1DPlanesHost<T>::ComplexType>, bool, bool, int, int) [with T=double]" at line 77 of "/data/user/krack/software/SpFFT-1.0.3/src/execution/execution_host.cpp"
            instantiation of "spfft::ExecutionHost<T>::ExecutionHost(int, std::shared_ptr<spfft::Parameters>, spfft::HostArray<std::complex<T>> &, spfft::HostArray<std::complex<T>> &) [with T=double]" at line 360 of "/data/user/krack/software/SpFFT-1.0.3/src/execution/execution_host.cpp"

compilation aborted for /data/user/krack/software/SpFFT-1.0.3/src/execution/execution_host.cpp (code 2)
make[2]: *** [src/CMakeFiles/spfft.dir/execution/execution_host.cpp.o] Error 2
make[2]: Leaving directory `/data/user/krack/software/SpFFT-1.0.3/build-cpu'
make[1]: *** [src/CMakeFiles/spfft.dir/all] Error 2
make[1]: Leaving directory `/data/user/krack/software/SpFFT-1.0.3/build-cpu'
make: *** [all] Error 2

Is that a compiler bug?

CMake can't find MLKSequential

Reproduce: just run code below in https://colab.research.google.com/ (ubuntu terminal through browser).

!apt-get install -y fftw3
!git clone https://github.com/eth-cscs/SpFFT
%cd SpFFT
%mkdir build
%cd build
!cmake .. -DSPFFT_OMP=ON -DSPFFT_MPI=ON -DSPFFT_GPU_BACKEND=CUDA -DSPFFT_SINGLE_PRECISION=OFF -DCMAKE_INSTALL_PREFIX=/usr/loca

Everything seems to work fine, except MLKSequential. Does anyone have any suggestions for how to fix this?

image

SpFFT fails to find MPI with Cray wrappers on Alps

Hi,
I could not configure properly SpFFT release 1.0.4 with the flag -DSPFFT_MPI=ON on the Alps system (production partition: Eiger). I load the default CMake (3.20.1), cpeGNU/21.06 and cray-fftw, then I configure as follows:

cmake ../SpFFT-1.0.4 -DCMAKE_BUILD_TYPE=RELEASE -DSPFFT_FFTW_LIB=FFTW -DSPFFT_SINGLE_PRECISION=ON -DSPFFT_MPI=ON -DSPFFT_OMP=ON
-- The CXX compiler identification is GNU 10.3.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: /opt/gcc/10.3.0/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Could NOT find MPI_CXX (missing: MPI_CXX_LIB_NAMES MPI_CXX_HEADER_DIR MPI_CXX_WORKS)
CMake Error at /apps/eiger/UES/jenkins/1.4.0/software/CMake/3.20.1/share/cmake-3.20/Modules/FindPackageHandleStandardArgs.cmake:230 (message):
  Could NOT find MPI (missing: MPI_CXX_FOUND CXX)
Call Stack (most recent call first):
  /apps/eiger/UES/jenkins/1.4.0/software/CMake/3.20.1/share/cmake-3.20/Modules/FindPackageHandleStandardArgs.cmake:594 (_FPHSA_FAILURE_MESSAGE)
  /apps/eiger/UES/jenkins/1.4.0/software/CMake/3.20.1/share/cmake-3.20/Modules/FindMPI.cmake:1742 (find_package_handle_standard_args)
  CMakeLists.txt:156 (find_package)

-- Configuring incomplete, errors occurred!

Even manually defining the following variables does not change the configure error:

-DMPI_CXX_COMPILER=/opt/cray/pe/craype/2.7.8/bin/CC 
-DMPI_CXX_COMPILER_INCLUDE_DIRS=/opt/cray/pe/mpich/8.1.6/ofi/gnu/9.1/include/
-DMPI_CXX_HEADER_DIR=/opt/cray/pe/mpich/8.1.6/ofi/gnu/9.1/include

Any advice would be very much appreciated!

Always linked against fftw3f even when single precision is not requested

The consequence of the current logic

if(_FFTW_FLOAT_LIBRARY AND FFTW_FOUND)
list(APPEND FFTW_LIBRARIES ${_FFTW_FLOAT_LIBRARY})
set(FFTW_FLOAT_FOUND TRUE)
else()
set(FFTW_FLOAT_FOUND FALSE)
endif()

is that libspfft.so is always linked against libfftw3f.so if found even when single precision is not requested.

Furthermore I see libfftw3_threads to be linked despite spfft not calling fftw with threads (just a quick glance, I may be wrong):

$ readelf -d /project/d110/timuel/spack/opt/spack/cray-sles15-zen2/gcc-10.2.0/spfft-1.0.3-meadkxxxregbcgl43lrhm4tb26vrs6o3/lib64/libspfft.so.1

Dynamic section at offset 0x38c18 contains 41 entries:
  Tag        Type                         Name/Value
 0x0000000000000001 (NEEDED)             Shared library: [libfftw3f_omp.so.mpi31.3]
 0x0000000000000001 (NEEDED)             Shared library: [libfftw3_omp.so.mpi31.3]
 0x0000000000000001 (NEEDED)             Shared library: [libgomp.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libfftw3.so.mpi31.3]
 0x0000000000000001 (NEEDED)             Shared library: [libfftw3f.so.mpi31.3]
 0x0000000000000001 (NEEDED)             Shared library: [libfftw3f_mpi.so.mpi31.3]
 0x0000000000000001 (NEEDED)             Shared library: [libfftw3f_threads.so.mpi31.3]
 0x0000000000000001 (NEEDED)             Shared library: [libfftw3_mpi.so.mpi31.3]
 0x0000000000000001 (NEEDED)             Shared library: [libfftw3_threads.so.mpi31.3]
 0x0000000000000001 (NEEDED)             Shared library: [libmpi_gnu_91.so.12]
 0x0000000000000001 (NEEDED)             Shared library: [libxpmem.so.0]
 0x0000000000000001 (NEEDED)             Shared library: [libm.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libstdc++.so.6]
 0x0000000000000001 (NEEDED)             Shared library: [libgcc_s.so.1]
 0x0000000000000001 (NEEDED)             Shared library: [libc.so.6]
 0x000000000000000e (SONAME)             Library soname: [libspfft.so.1]
 0x000000000000000f (RPATH)              Library rpath: [/project/d110/timuel/spack/opt/spack/cray-sles15-zen2/gcc-10.2.0/spfft-1.0.3-meadkxxxregbcgl43lrhm4tb26vrs6o3/lib:/project/d110/timuel/spack/opt/spack/cray-sles15-zen2/gcc-10.2.0/spfft-1.0.3-meadkxxxregbcgl43lrhm4tb26vrs6o3/lib64:/opt/cray/pe/fftw/3.3.8.9/x86_rome/lib:/opt/cray/pe/mpich/8.1.4/ofi/gnu/9.1/lib:/opt/gcc/10.2.0/snos:/project/d110/timuel/spack/opt/spack/cray-sles15-zen2/gcc-10.2.0/spfft-1.0.3-meadkxxxregbcgl43lrhm4tb26vrs6o3/lib:/project/d110/timuel/spack/opt/spack/cray-sles15-zen2/gcc-10.2.0/spfft-1.0.3-meadkxxxregbcgl43lrhm4tb26vrs6o3/lib64:/opt/cray/pe/fftw/3.3.8.9/x86_rome/lib:/opt/cray/pe/mpich/8.1.4/ofi/gnu/9.1/lib]
 0x000000000000000c (INIT)               0x7dd8
 0x000000000000000d (FINI)               0x2fa68
 0x0000000000000019 (INIT_ARRAY)         0x2381b8
 0x000000000000001b (INIT_ARRAYSZ)       8 (bytes)
 0x000000000000001a (FINI_ARRAY)         0x2381c0
 0x000000000000001c (FINI_ARRAYSZ)       8 (bytes)
 0x0000000000000004 (HASH)               0x190
 0x000000006ffffef5 (GNU_HASH)           0x8c0
 0x0000000000000005 (STRTAB)             0x2638
 0x0000000000000006 (SYMTAB)             0xdc0
 0x000000000000000a (STRSZ)              9990 (bytes)
 0x000000000000000b (SYMENT)             24 (bytes)
 0x0000000000000003 (PLTGOT)             0x239000
 0x0000000000000002 (PLTRELSZ)           3456 (bytes)
 0x0000000000000014 (PLTREL)             RELA
 0x0000000000000017 (JMPREL)             0x7058
 0x0000000000000007 (RELA)               0x5078
 0x0000000000000008 (RELASZ)             8160 (bytes)
 0x0000000000000009 (RELAENT)            24 (bytes)
 0x000000006ffffffe (VERNEED)            0x4f48
 0x000000006fffffff (VERNEEDNUM)         5
 0x000000006ffffff0 (VERSYM)             0x4d3e
 0x000000006ffffff9 (RELACOUNT)          217
 0x0000000000000000 (NULL)               0x0

installation of CMake modules/ into different directory leads to configuration error with SIRIUS

no matter what I set for CMAKE_PREFIX_PATH or CMAKE_MODULE_PATH, SIRIUS is unable to locate FindFFTW.cmake required by SpFFT:

[...]
-- Found LibXC: /data/tiziano/cp2k/tools/toolchain/install/libxc-5.1.4/lib/libxc.a (Required is at least version "4.0.0")
-- Found LibSPG: /data/tiziano/cp2k/tools/toolchain/install/spglib-1.16.0/lib/libsymspg.a
-- Found HDF5: /data/tiziano/cp2k/tools/toolchain/install/hdf5-1.12.0/lib/libhdf5.a;/usr/lib64/libz.so;/usr/lib64/libdl.so;/usr/lib64/libm.so (found version "1.12.0") found components: C
CMake Error at /data/tiziano/cp2k/tools/toolchain/install/cmake-3.18.5/share/cmake-3.18/Modules/CMakeFindDependencyMacro.cmake:47 (find_package):
  By not providing "FindFFTW.cmake" in CMAKE_MODULE_PATH this project has
  asked CMake to find a package configuration file provided by "FFTW", but
  CMake did not find one.

  Could not find a package configuration file provided by "FFTW" with any of
  the following names:

    FFTWConfig.cmake
    fftw-config.cmake

  Add the installation prefix of "FFTW" to CMAKE_PREFIX_PATH or set
  "FFTW_DIR" to a directory containing one of the above files.  If "FFTW"
  provides a separate development package or SDK, be sure it has been
  installed.
Call Stack (most recent call first):
  /data/tiziano/cp2k/tools/toolchain/install/SpFFT-1.0.3/lib/cmake/SpFFT/SpFFTStaticConfig.cmake:24 (find_dependency)
  /data/tiziano/cp2k/tools/toolchain/install/SpFFT-1.0.3/lib/cmake/SpFFT/SpFFTConfig.cmake:5 (include)
  CMakeLists.txt:138 (find_package)


-- Configuring incomplete, errors occurred!
See also "/data/tiziano/cp2k/tools/toolchain/build/SIRIUS-7.2.5/build/CMakeFiles/CMakeOutput.log".
ERROR: (./scripts/stage8/install_sirius.sh, line 139) Non-zero exit code detected.

Moving the modules/ directory from spfft/ to SpFFT/ where SpFFTTargets.cmake and friends are fixes the build error.

Using SpFFT inside of an OpenMP region

I am trying to use SpFFT inside of an OpenMP region and I keep getting segfaults. I modified one of your examples to show the issue. I just want to do the FFT in parallel over several independent frequecy regions:

program main
    use iso_c_binding
    use spfft
    implicit none
    integer :: i, j, k, counter
    integer, parameter :: dimX = 2
    integer, parameter :: dimY = 2
    integer, parameter :: dimZ = 2
    integer, parameter :: maxNumLocalZColumns = dimX * dimY
    integer, parameter :: processingUnit = 1
    integer, parameter :: maxNumThreads = -1
    type(c_ptr) :: grid = c_null_ptr
    type(c_ptr) :: transform = c_null_ptr
    integer :: errorCode = 0
    integer, dimension(dimX * dimY * dimZ * 3):: indices = 0
    complex(C_DOUBLE_COMPLEX), dimension(dimX * dimY * dimZ, 1000):: frequencyElements
    complex(C_DOUBLE_COMPLEX), pointer :: spaceDomain(:,:,:)
    type(c_ptr) :: realValuesPtr


    counter = 0
    do k = 1, dimZ
        do j = 1, dimY
           do i = 1, dimX
             frequencyElements(counter + 1,:) = cmplx(counter, -counter)
             indices(counter * 3 + 1) = i - 1
             indices(counter * 3 + 2) = j - 1
             indices(counter * 3 + 3) = k - 1
             counter = counter + 1
            end do
        end do
    end do

    ! print input
    ! print *, "Input:"
    ! do i = 1, size(frequencyElements)
    !      print *, frequencyElements(i)
    ! end do


    ! create grid and transform

    !$OMP PARALLEL  default(none) &
    !$OMP private(i, errorcode, grid,realValuesPtr, transform)&
    !$OMP shared(indices, frequencyElements)
    errorCode = spfft_grid_create(grid, dimX, dimY, dimZ, maxNumLocalZColumns, processingUnit, maxNumThreads);
    if (errorCode /= SPFFT_SUCCESS) error stop
    errorCode = spfft_transform_create(transform, grid, processingUnit, 0, dimX, dimY, dimZ, dimZ,&
        size(frequencyElements,1), SPFFT_INDEX_TRIPLETS, indices)
    if (errorCode /= SPFFT_SUCCESS) error stop

    ! grid can be safely destroyed after creating all required transforms
    errorCode = spfft_grid_destroy(grid)
    if (errorCode /= SPFFT_SUCCESS) error stop

    ! set space domain array to use memory allocted by the library
    errorCode = spfft_transform_get_space_domain(transform, processingUnit, realValuesPtr)
    if (errorCode /= SPFFT_SUCCESS) error stop

    ! transform backward
    !$OMP DO
    do i=1,1000
        errorCode = spfft_transform_backward(transform, frequencyElements(:,i), processingUnit)
        if (errorCode /= SPFFT_SUCCESS) error stop
    enddo
    !$OMP end do
    !$OMP end PARALLEL
end

If I run it I get a segfault and a core dump. This is the backtrace of the core:

#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
#1  0x00007f45cc137859 in __GI_abort () at abort.c:79
#2  0x00007f45cc1a23ee in __libc_message (action=action@entry=do_abort, fmt=fmt@entry=0x7f45cc2cc285 "%s\n") at ../sysdeps/posix/libc_fatal.c:155
#3  0x00007f45cc1aa47c in malloc_printerr (str=str@entry=0x7f45cc2ce278 "malloc_consolidate(): invalid chunk size") at malloc.c:5347
#4  0x00007f45cc1aac58 in malloc_consolidate (av=av@entry=0x7f45a0000020) at malloc.c:4477
#5  0x00007f45cc1ace03 in _int_malloc (av=av@entry=0x7f45a0000020, bytes=bytes@entry=9328) at malloc.c:3699
#6  0x00007f45cc1adc5f in _int_memalign (av=av@entry=0x7f45a0000020, alignment=alignment@entry=32, bytes=bytes@entry=9248) at malloc.c:4684
#7  0x00007f45cc1b051c in _mid_memalign (address=<optimized out>, bytes=9248, alignment=32) at malloc.c:3312
#8  __GI___libc_memalign (alignment=<optimized out>, bytes=9248) at malloc.c:3261
#9  0x00007f45cbf2f3b9 in fftw_malloc_plain () from /lib/x86_64-linux-gnu/libfftw3.so.3
#10 0x00007f45cbf30b4f in ?? () from /lib/x86_64-linux-gnu/libfftw3.so.3
#11 0x00007f45cbf3a3a8 in fftw_kdft_register () from /lib/x86_64-linux-gnu/libfftw3.so.3
#12 0x00007f45cbf33460 in fftw_solvtab_exec () from /lib/x86_64-linux-gnu/libfftw3.so.3
#13 0x00007f45cbf36a3f in fftw_dft_conf_standard () from /lib/x86_64-linux-gnu/libfftw3.so.3
#14 0x00007f45cc00191d in fftw_configure_planner () from /lib/x86_64-linux-gnu/libfftw3.so.3
#15 0x00007f45cc005280 in fftw_the_planner () from /lib/x86_64-linux-gnu/libfftw3.so.3
#16 0x00007f45cc0016ae in fftw_mkapiplan () from /lib/x86_64-linux-gnu/libfftw3.so.3
#17 0x00007f45cc004e07 in fftw_plan_many_dft () from /lib/x86_64-linux-gnu/libfftw3.so.3
#18 0x00007f45cc65a89c in spfft::FFTWPlan<double>::FFTWPlan (this=0x7f45a0001120, input=0x7f45a0001000, output=0x7f45a0001000, size=2, istride=1, ostride=1, idist=2, odist=2, howmany=1, sign=1) at /home/matthias/libraries/SpFFT/src/fft/fftw_plan_1d.hpp:80
#19 0x00007f45cc662fe1 in __gnu_cxx::new_allocator<spfft::FFTWPlan<double> >::construct<spfft::FFTWPlan<double>, std::complex<double>*, std::complex<double>*, unsigned long long const&, unsigned long long const&, unsigned long long const&, unsigned long long const&, unsigned long long const&, unsigned long long const&, int&> (this=0x7f45a0000d88, __p=0x7f45a0001120) at /usr/include/c++/9/ext/new_allocator.h:147
#20 0x00007f45cc6615b0 in std::allocator_traits<std::allocator<spfft::FFTWPlan<double> > >::construct<spfft::FFTWPlan<double>, std::complex<double>*, std::complex<double>*, unsigned long long const&, unsigned long long const&, unsigned long long const&, unsigned long long const&, unsigned long long const&, unsigned long long const&, int&> (__a=..., __p=0x7f45a0001120) at /usr/include/c++/9/bits/alloc_traits.h:484
#21 0x00007f45cc65fc22 in std::vector<spfft::FFTWPlan<double>, std::allocator<spfft::FFTWPlan<double> > >::emplace_back<std::complex<double>*, std::complex<double>*, unsigned long long const&, unsigned long long const&, unsigned long long const&, unsigned long long const&, unsigned long long const&, unsigned long long const&, int&> (this=0x7f45a0000d88) at /usr/include/c++/9/bits/vector.tcc:115
#22 0x00007f45cc65d561 in spfft::Transform1DPlanesHost<double>::Transform1DPlanesHost (this=0x7f45a0000d80, inputData=..., outputData=..., transposeInputData=false, transposeOutputData=false, sign=1, maxNumThreads=12)
    at /home/matthias/libraries/SpFFT/src/fft/transform_1d_host.hpp:111
#23 0x00007f45cc65b38c in spfft::ExecutionHost<double>::ExecutionHost (this=0x7f45a0000f00, numThreads=12, param=std::shared_ptr<class spfft::Parameters> (use count 4, weak count 0) = {...}, array1=..., array2=...)
    at /home/matthias/libraries/SpFFT/src/execution/execution_host.cpp:77
#24 0x00007f45cc668cba in spfft::TransformInternal<double>::TransformInternal (this=0x7f45a0000ec0, executionUnit=SPFFT_PU_HOST, grid=std::shared_ptr<class spfft::GridInternal<double>> (empty) = {...}, param=std::shared_ptr<class spfft::Parameters> (empty) = {...})
    at /home/matthias/libraries/SpFFT/src/spfft/transform_internal.cpp:91
#25 0x00007f45cc66656c in spfft::Transform::Transform (this=0x7f45a0000c00, grid=std::shared_ptr<class spfft::GridInternal<double>> (use count 2, weak count 0) = {...}, processingUnit=SPFFT_PU_HOST, transformType=SPFFT_TRANS_C2C, dimX=2, dimY=2, dimZ=2,
    localZLength=2, numLocalElements=8, indexFormat=SPFFT_INDEX_TRIPLETS, indices=0x55fb923ec040 <indices>) at /home/matthias/libraries/SpFFT/src/spfft/transform.cpp:63
#26 0x00007f45cc66ab35 in spfft::Grid::create_transform (this=0x7f45a0000b60, processingUnit=SPFFT_PU_HOST, transformType=SPFFT_TRANS_C2C, dimX=2, dimY=2, dimZ=2, localZLength=2, numLocalElements=8, indexFormat=SPFFT_INDEX_TRIPLETS, indices=0x55fb923ec040 <indices>)
    at /home/matthias/libraries/SpFFT/src/spfft/grid.cpp:60
#27 0x00007f45cc666a70 in spfft_transform_create (transform=0x7f45cb344e10, grid=0x7f45a0000b60, processingUnit=SPFFT_PU_HOST, transformType=SPFFT_TRANS_C2C, dimX=2, dimY=2, dimZ=2, localZLength=2, numLocalElements=8, indexFormat=SPFFT_INDEX_TRIPLETS,
    indices=0x55fb923ec040 <indices>) at /home/matthias/libraries/SpFFT/src/spfft/transform.cpp:132
#28 0x000055fb923e941f in MAIN__::MAIN__._omp_fn.0 () at test.f90:49
#29 0x00007f45cc31e77e in ?? () from /lib/x86_64-linux-gnu/libgomp.so.1
#30 0x00007f45cbb55609 in start_thread (arg=<optimized out>) at pthread_create.c:477
#31 0x00007f45cc234103 in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

error: array must be initialized with a brace-enclosed initializer

The following error came while installing SpFFT while running make -j8 install:

[PATH]/src/memory/host_array_view.hpp:164:3: error: array must be initialized with a brace-enclosed initializer
   HostArrayView3D() = default;

Any info regarding this would be helpful. This occured while installing SpFFT locally in a cluster. I have attached the full output in a file.
SpFFTmake.log

`device_synchronize()` always ON on AMD

When compiling for AMD execution in Release mode, the device_synchronize is still active which causes overheads (issue found in while profiling SIRIUS, a tracing snapshot is attached at the end).

More specifically, the #ifndef NDEBUG block before and after the kernel launch is always executed.

device_synchronize() is not present when compiled with CUDA in Release mode.

This is because the corresponding HIP_FLAGS are missing, where for CUDA, they are explicitly defined in the CMAKE

image

Provide SOVERSION for the shared library

I am interested in packaging SpFFT for Debian. Debian uses SOVERSIONs to track the ABI changes, therefore there is a requirement for libraries to have one. The simplest way to set a SOVERSION would be to add the following to CMakeLists.txt:

# set SOVERSION
set_property(TARGET spfft PROPERTY SOVERSION 0)

Here the SOVERSION is set to 0, but could be any version of your choice.

Unable to get correct result using openMPI

Hi,

I try to use SpFFT library with openMPI. When i run following code using only single process i get correct result. However when i use 2 processes on same input, the result of Sparse FFT is incorrect. I tried different orientation (indexing of points in input domain) but unsuccessfully.

I attache simplified code I am using to make computation. This code is inspired by code in benchmark.cpp and example.cpp file.

#include <vector>
#include <omp.h>

#include "spfft/spfft.hpp"

std::vector<std::pair<int, int>> createAllXYIndexPairs(int dimX, int dimY) {
    std::vector<std::pair<int, int>> indices;
    indices.reserve(dimX * dimY);
    for (int x = 0; x < dimY; x++) {
        for (int y = 0; y < dimX; y++) {
            indices.emplace_back(x, y);
        }
    }
    return indices;
}

std::vector<int>
createTripleIndices(int dimZ, int numLocalZSticks, int offset, std::vector<std::pair<int, int>> xyIndicesGlobal) {
    std::vector<int> indices;
    indices.reserve(numLocalZSticks);
    for (int i = offset; i < offset + numLocalZSticks; i++) {
        for (int z = 0; z < dimZ; z++) {
            indices.push_back(xyIndicesGlobal[i].first);
            indices.push_back(xyIndicesGlobal[i].second);
            indices.push_back(z);
        }
    }
    return indices;
}


int main(int argc, char **argv) {
    int dimX = 2;
    int dimY = 2;
    int dimZ = 2;

    int num;
    MPI_Init_thread(nullptr, nullptr, MPI_THREAD_MULTIPLE, &num);

    int commRank = 0;
    int commSize = 1;

    MPI_Comm_size(MPI_COMM_WORLD, &commSize);
    MPI_Comm_rank(MPI_COMM_WORLD, &commRank);

    // Use default OpenMP value
    const int numThreads = omp_get_max_threads();

    // Input signal
    std::vector<double> signal = {2, 2, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1};

    // Create all global x-y index pairs
    std::vector<std::pair<int, int>> xyIndicesGlobal = createAllXYIndexPairs(dimX, dimY);

    // Distribute z
    int numLocalZSticks =
            (xyIndicesGlobal.size()) / commSize + (commRank < (xyIndicesGlobal.size()) % commSize ? 1 : 0);
    const int offset =
            ((xyIndicesGlobal.size()) / commSize) * commRank +
            std::min(commRank, static_cast<int>(xyIndicesGlobal.size()) % commSize);

    // Assemble vector of x y z indices
    std::vector<int> xyzIndices = createTripleIndices(dimZ, numLocalZSticks, offset, xyIndicesGlobal);

    int maxLocalZLength = (dimZ / commSize) + (commRank < dimZ % commSize ? 1 : 0);

    spfft::Transform transform(numThreads, MPI_COMM_WORLD, SPFFT_EXCH_DEFAULT, SPFFT_PU_HOST, SPFFT_TRANS_C2C, dimX,
                               dimY, dimZ, maxLocalZLength, xyzIndices.size() / 3, SPFFT_INDEX_TRIPLETS,
                               xyzIndices.data());

    MPI_Barrier(MPI_COMM_WORLD);

    transform.backward(signal.data(), SPFFT_PU_HOST);

    double *spaceDomainPtr = transform.space_domain_data(SPFFT_PU_HOST);

    // Print result
    for (int i = 0; i < transform.local_slice_size(); ++i) {
        std::cout << "Backward transform point rank " << commRank << ": "  << spaceDomainPtr[2 * i] << ", " << spaceDomainPtr[2 * i + 1] << std::endl;
    }

    MPI_Finalize();
    return 0;
}

Same issue occurs when using grid instead of only Transform as follow:

    spfft::Grid grid(dimX, dimY, dimZ, numLocalZSticks, maxLocalZLength,
                     SPFFT_PU_HOST, numThreads, MPI_COMM_WORLD, SPFFT_EXCH_DEFAULT);

    MPI_Barrier(MPI_COMM_WORLD);

    auto transform = grid.create_transform(SPFFT_PU_HOST, SPFFT_TRANS_C2C, dimX, dimY, dimZ,
                                           maxLocalZLength, xyzIndices.size() / 3, SPFFT_INDEX_TRIPLETS,
                                           xyzIndices.data());

Thank you very much for any advice that allow me to solve this issue. I really appreciate it.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.