pghysels / strumpack Goto Github PK

Structured Matrix Package (LBNL)

Home Page: http://portal.nersc.gov/project/sparse/strumpack/

License: Other

CMake 3.68% C++ 82.78% Fortran 8.42% C 1.23% Shell 0.88% CSS 0.61% HTML 0.07% Cuda 2.10% SWIG 0.22%

preconditioner linear-systems hss dense-matrices matrix-computations linear-algebra sparse-matrix sparse-linear-systems hpc

strumpack's People

Contributors

Stargazers

Watchers

strumpack's Issues

Complex-valued implementation of SparseSolver, (2) Mapping from C++ to Fortran

Hi thanks for your previous help. I would also like to ask if it is different between real- and complex-valued linear equations when using SparseSolver. Because I tried changing the real matrix in fexample.f90 (of example folder) to complex type, whereas I got errors like:

Intel MKL ERROR: Parameter 6 was incorrect on entry to DLASWP.

(2) Another important question is still about the different calls between C and fortran I asked yesterday. For example, in documentation, I guess the mapping relationship is as

StrumpackSparseSolver sp; // create solver object =====> (Fortran) type(STRUMPACK_SparseSolver) :: sp
sp.options().set_rel_tol(1e-10); // set options =====> (Fortran) ?? Hope give an example
sp.reorder(); // reorder matrix =====> (Fortran) ?? is it sp%reorder='metis'

In fact, this mapping is the main difficult for me, I expect there are mentions in documentation, but there seems not.

Build STRUMPACK without Fortran possible ?

Hi Strumpack Developers,
I hope this is the place to ask questions about STRUMPACK.
Is it possible to build strumpack without a Fortran compiler ? I tried to but could not do so.
Reason I am asking is that I am using Intel Compiler for C/C++ but do not want to pay extra to buy a Fortran compiler.
Thanks,
Rochan

distributed memory version

Hi, my name is Mike Puso and I work with a number of finite element code developers at LLNL and are interested in you sparse direct solver in particular. I noticed in your latest paper that you mention you have a version of that but the results were from a shared memory version. I was wondering what the status and availability of distributed memory solver is. We would be very interested in collaborating with you and running it on our latest sierra platform. We currently use PWSMP (leased) from the IBM group of Anshul Gupta, Super LU, MUMPS and Pastix. The PWSMP solver is what we use the most due to its superior performance but we would prefer to have a very good open source direct solver.

STRUMPACK 3.2.0 fails to link with dscset_ and other functions from scalapack with icc, mkl and openmpi.

My compilation comes to the point where it tries to link test_HSS_seq and fails because it cannot find descset_

[ 90%] Linking CXX executable test_HSS_seq
cd /users/dslavchev/Programs/STRUMPACK_3.2.0/build/test && /users/dslavchev/local/bin/cmake -E cmake_link_script CMakeFiles/test_HSS_seq.dir/link.txt --verbose=1
/opt/intel/compilers_and_libraries_2019.5.281/linux/bin/intel64/icpc -std=c++14 -pthread -qopenmp -O3 -DNDEBUG -Wl,-rpath -Wl,/users/dslavchev/local/lib -Wl,--enable-new-dtags -pthread -Wl,-rpath -Wl,/users/dslavchev/local/lib -Wl,--enable-new-dtags -pthread -Wl,-rpath -Wl,/users/dslavchev/local/lib -Wl,--enable-new-dtags -rdynamic CMakeFiles/test_HSS_seq.dir/test_HSS_seq.cpp.o -o test_HSS_seq -Wl,-rpath,/users/dslavchev/local/lib ../libstrumpack.a /users/dslavchev/local/lib/libmpi.so /users/dslavchev/local/lib/libmpi_usempif08.so /users/dslavchev/local/lib/libmpi_usempi_ignore_tkr.so /users/dslavchev/local/lib/libmpi_mpifh.so /users/dslavchev/local/lib/libmpi.so /users/dslavchev/local/lib/libmpi_usempif08.so /users/dslavchev/local/lib/libmpi_usempi_ignore_tkr.so /users/dslavchev/local/lib/libmpi_mpifh.so -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lm -ldl -lmkl_intel_lp64 -lmkl_intel_thread -lmkl_core -liomp5 -lm -ldl /users/dslavchev/local/lib/libmetis.a -lifport -lifcoremt -lpthread
CMakeFiles/test_HSS_seq.dir/test_HSS_seq.cpp.o: In function std::_Function_handler<void (std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, strumpack::DenseMatrix<double>&), strumpack::HSS::LocalElemMult<double> >::_M_invoke(std::_Any_data const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, std::vector<unsigned long, std::allocator<unsigned long> > const&, strumpack::DenseMatrix<double>&)': test_HSS_seq.cpp:(.text._ZNSt17_Function_handlerIFvRKSt6vectorImSaImEES4_RN9strumpack11DenseMatrixIdEEENS5_3HSS13LocalElemMultIdEEE9_M_invokeERKSt9_Any_dataS4_S4_S8_[_ZNSt17_Function_handlerIFvRKSt6vectorImSaImEES4_RN9strumpack11DenseMatrixIdEEENS5_3HSS13LocalElemMultIdEEE9_M_invokeERKSt9_Any_dataS4_S4_S8_]+0x57d): undefined reference to descset_'

I am using openmpi, with mkl and icc. I use compilers_and_libraries_2019.5.281 version. I use the openmpi mpicc wrapper which calls icc.

I am able to compile and run the simple hello world MPI from the mkl/test.

I link with mkl staticly because the link adviser doesn't show openmpi as an option for dynamic linking. The linking line is:
ScaLAPACKLIBS=" ${MKLROOT}/lib/intel64/libmkl_scalapack_lp64.a -Wl,--start-group ${MKLROOT}/lib/intel64/libmkl_intel_lp64.a ${MKLROOT}/lib/intel64/libmkl_intel_thread.a ${MKLROOT}/lib/intel64/libmkl_core.a ${MKLROOT}/lib/intel64/libmkl_blacs_openmpi_lp64.a -Wl,--end-group -liomp5 -lpthread -lm -ldl"

I have installed cmake-3.15.4, metis-5.1.0, parmetis-4.0.3, scotch_6.0.8, openmpi-4.0.2 which are all installed in ~/local.

I am working on:
dslavchev@nv001:~/Programs$ lsb_release -a
LSB Version: :core-4.1-amd64:core-4.1-noarch
Distributor ID: RedHatEnterpriseServer
Description: Red Hat Enterprise Linux Server release 7.7 (Maipo)
Release: 7.7
Codename: Maipo

I have attached the full output of the build in the txt
build_output.txt

I use the build script in the bellow .zip. My machine is nv001, so it picks the first option.
dslavchev_build.zip

How to chose the sampling parameters in STRUMPACK-Dense-1.1.1?

Hello author, when i used the STRUMPACK-Dense-1.1.1, i found the sampling parameters selection is important. But when the martix size different, how to chose the sampling parameters? (min_rand_HSS,lim_rand_HSS,inc_rand_HSS and max_rand_HSS)There is no description in the user‘s manual.Can you explain it to me? Thanks.

Issues with using STRUMPACK via PETSc when using kokkos matrices/vectors

Hi STRUMPACK-devs,

I'm trying to use STRUMPACK (on a CPU only system for now) via PETSc, with kokkos-matrices and vectors (the aijkokkos matrix type and kokkos vector type, overview).

However, the routine strumpack::SparseSolver<double, int>::set_csr_matrix segfaults. A backtrace is included below for reference:

sajid@LAPTOP-CDJT2P3R ~/p/a/3D (main)> gdb --args ./poisson3d -dm_mat_type aijkokkos -dm_vec_type kokkos -ksp_type gmres -pc_type lu -pc_factor_mat_solver_type strumpack -mat_strumpack_verbose -ksp_monitor -ksp_view -log_view
...
Reading symbols from ./poisson3d...
(gdb) run
Starting program: /home/sajid/packages/aclatfd/3D/poisson3d -dm_mat_type aijkokkos -dm_vec_type kokkos -ksp_type gmres -pc_type lu -pc_factor_mat_solver_type strumpack -mat_strumpack_verbose -ksp_monitor -ksp_view -log_view
...
# Initializing STRUMPACK
# using 16 OpenMP thread(s)
# number of tasking levels = 7 = log_2(#threads) + 3
# using 1 MPI processes

Thread 1 "poisson3d" received signal SIGSEGV, Segmentation fault.
0x00007ffff50ee20b in std::__1::default_delete<strumpack::CSRMatrix<double, int> >::operator() (this=0xe515e8, __ptr=0x7ffff5e465e8 <vtable for strumpack::MPIComm+16>)
    at /home/sajid/packages/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-11.2.0/llvm-13.0.0-igzxdbdikhixp3czhzxq3b4zo6rycypk/bin/../include/c++/v1/__memory/unique_ptr.h:57
57          delete __ptr;
(gdb) bt
#0  0x00007ffff50ee20b in std::__1::default_delete<strumpack::CSRMatrix<double, int> >::operator() (this=0xe515e8, __ptr=0x7ffff5e465e8 <vtable for strumpack::MPIComm+16>)
    at /home/sajid/packages/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-11.2.0/llvm-13.0.0-igzxdbdikhixp3czhzxq3b4zo6rycypk/bin/../include/c++/v1/__memory/unique_ptr.h:57
#1  std::__1::unique_ptr<strumpack::CSRMatrix<double, int>, std::__1::default_delete<strumpack::CSRMatrix<double, int> > >::reset (this=0xe515e8, __p=<optimized out>)
    at /home/sajid/packages/spack/opt/spack/linux-ubuntu20.04-zen2/gcc-11.2.0/llvm-13.0.0-igzxdbdikhixp3czhzxq3b4zo6rycypk/bin/../include/c++/v1/__memory/unique_ptr.h:318
#2  strumpack::SparseSolver<double, int>::set_csr_matrix (this=0xe51370, N=<optimized out>, row_ptr=<optimized out>, col_ind=<optimized out>, values=<optimized out>, symmetric_pattern=<optimized out>)
    at /tmp/sajid/spack-stage/spack-stage-strumpack-6.1.0-h2fgkyy55sixavjkz7yg4zlcaelmeao2/spack-src/src/SparseSolver.cpp:107
#3  0x00007ffff6e0cc11 in MatLUFactorNumeric_STRUMPACK () from /home/sajid/packages/spack/opt/spack/linux-ubuntu20.04-zen2/clang-13.0.0/petsc-main-agamz2b6swtdudjn7qhusnie3sohixjj/lib/libpetsc.so.3.016
#4  0x00007ffff6b0e5df in MatLUFactorNumeric () from /home/sajid/packages/spack/opt/spack/linux-ubuntu20.04-zen2/clang-13.0.0/petsc-main-agamz2b6swtdudjn7qhusnie3sohixjj/lib/libpetsc.so.3.016
#5  0x00007ffff77123a6 in PCSetUp_LU () from /home/sajid/packages/spack/opt/spack/linux-ubuntu20.04-zen2/clang-13.0.0/petsc-main-agamz2b6swtdudjn7qhusnie3sohixjj/lib/libpetsc.so.3.016
#6  0x00007ffff7728ffc in PCSetUp () from /home/sajid/packages/spack/opt/spack/linux-ubuntu20.04-zen2/clang-13.0.0/petsc-main-agamz2b6swtdudjn7qhusnie3sohixjj/lib/libpetsc.so.3.016
#7  0x00007ffff73fe6f6 in KSPSetUp () from /home/sajid/packages/spack/opt/spack/linux-ubuntu20.04-zen2/clang-13.0.0/petsc-main-agamz2b6swtdudjn7qhusnie3sohixjj/lib/libpetsc.so.3.016
#8  0x0000000000403452 in main (argc=15, argv=0x7fffffffbd58) at poisson3d.c:69

The source code is available here, in case it is needed.

Is this operation supported? If yes, could someone let me know what can be done to fix the above issue?

Thanks in advance!

msys2 (mingw), error in 32 bit

I want to create a package for msys2 (mingw), for the mingw64 version the package was built without errors, but for the mingw32 version I have the following error:

  [30/182] Building CXX object CMakeFiles/strumpack.dir/src/dense/BLASLAPACKWrapper.cpp.obj
  FAILED: CMakeFiles/strumpack.dir/src/dense/BLASLAPACKWrapper.cpp.obj 
  D:\M\msys64\mingw32\bin\g++.exe  -IC:/_/mingw-w64-strumpack/src/STRUMPACK-6.1.0/src -IC:/_/mingw-w64-strumpack/src/build-MINGW32 -march=pentium4 -mtune=generic -O2 -pipe -O3 -DNDEBUG -Wall -Wno-overloaded-virtual -fopenmp -MD -MT CMakeFiles/strumpack.dir/src/dense/BLASLAPACKWrapper.cpp.obj -MF CMakeFiles\strumpack.dir\src\dense\BLASLAPACKWrapper.cpp.obj.d -o CMakeFiles/strumpack.dir/src/dense/BLASLAPACKWrapper.cpp.obj -c C:/_/mingw-w64-strumpack/src/STRUMPACK-6.1.0/src/dense/BLASLAPACKWrapper.cpp
  C:/_/mingw-w64-strumpack/src/STRUMPACK-6.1.0/src/dense/BLASLAPACKWrapper.cpp:1728:17: error: redefinition of 'std::size_t strumpack::blas::lange(char, int, int, const size_t*, int)'
   1728 |     std::size_t lange(char norm, int m, int n, const std::size_t *a, int lda) { return 0; }
        |                 ^~~~~
  C:/_/mingw-w64-strumpack/src/STRUMPACK-6.1.0/src/dense/BLASLAPACKWrapper.cpp:1727:18: note: 'unsigned int strumpack::blas::lange(char, int, int, const unsigned int*, int)' previously defined here
   1727 |     unsigned int lange(char norm, int m, int n, const unsigned int *a, int lda) { return 0; }
        |                  ^~~~~
  [31/182] Building CXX object CMakeFiles/strumpack.dir/src/clustering/NeighborSearch.cpp.obj

Problems with Scotch header

On Linux, I run into the following issue using openmpi and gcc:

In file included from /home/user/strumpack-2.1.0/src/PTScotchReordering.hpp:28:0,
                 from /home/user/strumpack-2.1.0/src/MatrixReorderingMPI.hpp:34,
                 from /home/user/strumpack-2.1.0/src/EliminationTreeMPI.hpp:35,
                 from /home/user/strumpack-2.1.0/src/StrumpackSparseSolverMPI.hpp:37,
                 from /home/user/strumpack-2.1.0/src/StrumpackSparseSolver.cpp:31:
/usr/include/scotch-int64/ptscotch.h:97:3: error: conflicting declaration ‘typedef struct SCOTCH_Arch SCOTCH_Arch’
 } SCOTCH_Arch;
   ^~~~~~~~~~~
In file included from /home/user/strumpack-2.1.0/src/ScotchReordering.hpp:33:0,
                 from /home/user/strumpack-2.1.0/src/MatrixReordering.hpp:41,
                 from /home/user/strumpack-2.1.0/src/StrumpackSparseSolver.hpp:52,
                 from /home/user/strumpack-2.1.0/src/StrumpackSparseSolver.cpp:30:
/usr/include/scotch-int64/scotch.h:97:3: note: previous declaration as ‘typedef struct SCOTCH_Arch SCOTCH_Arch’
 } SCOTCH_Arch;
   ^~~~~~~~~~~

The instruction about SLATE is missing

Hello, maintainers.

I've tried to use STRUMPACK with SLATE to use CUDA support in the MPI setting and found that SLATE's ScaLAPACK API is uncommented in CMakeList.txt in current SLATE repository.

If STRUMPACK can only work with specific version of SLATE, please add instruction. It might be very helpful.

Failed compilation of STRUMPACK 6.3.0

Hello,

I have found two issues:

I get the following error:

CMake Error at cmake/Modules/FindMETIS.cmake:168 (string):
  string sub-command REGEX, mode REPLACE needs at least 6 arguments total to
  command.
Call Stack (most recent call first):
  CMakeLists.txt:379 (find_package)

This one easily fixable using the method from here facebook/rocksdb#1230

I.e. I went to cmake/Modules/FindMETIS.cmake:168 and put " around the last argument of string

After doing the above cmake runs, but now I have compilation errors:

In file included from /home/dslavchev/NuclearTesting/software/STRUMPACK-6.3.0/src/sparse/ordering/MetisReordering.hpp:35,
                 from /home/dslavchev/NuclearTesting/software/STRUMPACK-6.3.0/src/sparse/CSRGraph.cpp:34:
/home/dslavchev/.local/include/metis.h:93:9: error: ‘SCOTCH_Num’ does not name a type
   93 | typedef SCOTCH_Num          idx_t;
      |         ^~~~~~~~~~
In file included from /home/dslavchev/NuclearTesting/software/STRUMPACK-6.3.0/src/sparse/CSRGraph.cpp:34:
/home/dslavchev/NuclearTesting/software/STRUMPACK-6.3.0/src/sparse/ordering/MetisReordering.hpp:47:16: error: ‘idx_t’ was not declared in this scope; did you mean ‘id_t’?
   47 |   (std::vector<idx_t>& xadj, std::vector<idx_t>& adjncy,
      |                ^~~~~
      |                id_t

Below is the full output and the script I use.
MKL is 2022, metis 5.1.0, parmetis 4.0.3, scotch 7.0.1 and gcc is 9.3.0

cmake_failure_dslavchev.txt
dslavche_build.sh.zip

STRUMPACK-3.0.3 build error on ubnuntu-16.04 via spack

-- OpenMP specification date: 
CMake Error at CMakeLists.txt:94 (if):
  if given arguments:

    "OpenMP_CXX_SPEC_DATE" "VERSION_GREATER_EQUAL" "201511"

  Unknown arguments specified


-- Configuring incomplete, errors occurred!

Attaching complete log
STRUMPACK-3.0.3-spack-build.out.txt

Add Fortran bindings

Hi! @xiaoyeli and I chatted this morning at ECP about using SWIG-Fortran, which underpins ForTrilinos, to automatically generate Fortran-2003 bindings for STRUMPACK.

I'm creating this issue to see if this is a good direction to go and track progress. The process of generating wrappers is automated, not quite automatic, so it'll have to be done in stages. If you can point me to some of the lower-level classes that would make a good example, I can get started with demonstrating what the SWIG interface file (input) and generated bindings (output) look like. I could also throw SWIG at the flat wrapper StrumpackSparseSolver.h to show you what that would look like.

Installation of strumpack-3.1.0 on Windows 10

Hello!

I have encountered the problem of installation strumpack on windows platform.

Actually, as I understand, build was successful, but the final Make file has not been generated. So, commands "make", "make tests", "make install" after installation don't work. Also, there is no libstrumpack.a file.

These are the last rows in command prompt:
-- Configuring done
-- Generating done
-- Build files have been written to:

Compilers are working, Metis library has been found.

Could you help me, please, with this issue?

Dmitriy

installation on windows platform

Hello!

I have encountered the problem of installation strumpack on windows platform.

I have no idea whether your project supports windows. I tried to build it on windows, but failed several times. I am not very familar with cmake, so i do not kown if this is my fault.

my cmake command in my own bat is:

cmake command

cmake.exe ../
-DCMAKE_BUILD_TYPE=Release
-DTPL_BLAS_LIBRARIES="C:/Users/godbian/Downloads/BLAS.lib"
pause

cmake command

and output was

output

-- Building for: Visual Studio 16 2019
-- Selecting Windows SDK version 10.0.18362.0 to target Windows 10.0.19041.
-- The CXX compiler identification is MSVC 19.27.29112.0
-- The C compiler identification is MSVC 19.27.29112.0
-- The Fortran compiler identification is Intel 19.1.2.20200623
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.27.29110/bin/Hostx64/x64/cl.exe
-- Check for working CXX compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.27.29110/bin/Hostx64/x64/cl.exe -- works
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.27.29110/bin/Hostx64/x64/cl.exe
-- Check for working C compiler: C:/Program Files (x86)/Microsoft Visual Studio/2019/Professional/VC/Tools/MSVC/14.27.29110/bin/Hostx64/x64/cl.exe -- works
-- Detecting C compiler ABI info
-- Detecting C compiler ABI info - done
-- Detecting C compile features
-- Detecting C compile features - done
-- Check for working Fortran compiler: D:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2020/windows/bin/intel64/ifort.exe
-- Check for working Fortran compiler: D:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2020/windows/bin/intel64/ifort.exe -- works
-- Detecting Fortran compiler ABI info
-- Detecting Fortran compiler ABI info - done
-- Determine Intel Fortran Compiler Implicit Link Path
-- Determine Intel Fortran Compiler Implicit Link Path -- done
-- Checking whether D:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2020/windows/bin/intel64/ifort.exe supports Fortran 90
-- Checking whether D:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2020/windows/bin/intel64/ifort.exe supports Fortran 90 -- yes
-- Detecting Fortran/C Interface
-- Detecting Fortran/C Interface - Found GLOBAL and MODULE mangling
-- Verifying Fortran/CXX Compiler Compatibility
-- Verifying Fortran/CXX Compiler Compatibility - Success
-- Found MPI_C: D:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2020.2.254/windows/mpi/intel64/lib/release/impi.lib (found version "3.1")
-- Found MPI_CXX: D:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2020.2.254/windows/mpi/intel64/lib/release/impi.lib (found version "3.1")
-- Found MPI_Fortran: D:/Program Files (x86)/IntelSWTools/compilers_and_libraries_2020.2.254/windows/mpi/intel64/lib/release/impi.lib (found version "3.1")
-- Found MPI: TRUE (found version "3.1")
-- Found OpenMP_C: -openmp (found version "2.0")
-- Found OpenMP_CXX: -openmp (found version "2.0")
-- Found OpenMP_Fortran: -Qopenmp (found version "5.0")
-- Found OpenMP: TRUE (found version "2.0")
-- Support for OpenMP task depend/priority: FALSE
-- Support for OpenMP taskloop: FALSE
-- Looking for a CUDA compiler
-- Looking for a CUDA compiler - NOTFOUND
CMake Warning at CMakeLists.txt:110 (message):
CUDA compiler not found, proceeding without CUDA support.
CMake Warning at CMakeLists.txt:123 (find_package):
By not providing "Findhip.cmake" in CMAKE_MODULE_PATH this project has
asked CMake to find a package configuration file provided by "hip", but
CMake did not find one.
Could not find a package configuration file provided by "hip" with any of
the following names:
hipConfig.cmake
hip-config.cmake
Add the installation prefix of "hip" to CMAKE_PREFIX_PATH or set "hip_DIR"
to a directory containing one of the above files. If "hip" provides a
separate development package or SDK, be sure it has been installed.
CMake Warning at CMakeLists.txt:149 (message):
HIP compiler not found, proceeding without HIP support.
-- Linking with TPL_BLAS_LIBRARIES did not work, trying again with additional threading library linked in.
-- Looking for pthread.h
-- Looking for pthread.h - not found
-- Found Threads: TRUE
CMake Error at CMakeLists.txt:185 (message):
BLAS libraries defined in TPL_BLAS_LIBRARIES
(D:/Tools/lapack-master/build/lib/libblas.lib) cannot be used.
-- Configuring incomplete, errors occurred!
See also "E:/UGit/STRUMPACK/build/CMakeFiles/CMakeOutput.log".
See also "E:/UGit/STRUMPACK/build/CMakeFiles/CMakeError.log".

output

Could you help me, please, with this issue?

Solving MFEM's ComplexHypreParMatrix in Strumpack's native complex arithmetic

I am attempting to use Strumpack to solve complex-valued, indefinite Maxwell equation (motivated by mfem/mfem#2869 (comment)).

I found the mfem::STRUMPACKSolver interface and its usage in MFEM's ex11p.cpp (thanks for that!). Then I in adapted the ex25p time-harmonic Maxwell example, by replacing the SuperLU solver section with Strumpack:

HypreParMatrix *A = Ah.As<ComplexHypreParMatrix>()->GetSystemMatrix();
Operator * Arow = NULL;
Arow = new STRUMPACKRowLocMatrix(*A);

STRUMPACKSolver * strumpack = new STRUMPACKSolver(argc, argv, MPI_COMM_WORLD);
strumpack->SetPrintFactorStatistics(true);
strumpack->SetPrintSolveStatistics(true);
strumpack->SetKrylovSolver(strumpack::KrylovSolver::DIRECT);
strumpack->SetReorderingStrategy(strumpack::ReorderingStrategy::METIS);
strumpack->DisableMatching();
strumpack->SetOperator(*Arow);
strumpack->Mult(B, X);
delete A;

The code works and the result is correct. However I believe the GetSystemMatrix() call converts complex matrix to block 2x2 real form, which is not optimal for performance.

The strumpack->SetOperator() function only takes STRUMPACKRowLocMatrix, not complex operator (error STRUMPACKSolver::SetOperator : not STRUMPACKRowLocMatrix!). The STRUMPACKRowLocMatrix() converter takes only real HypreParMatrix, not complex one (error error: no matching function for call to ‘mfem::STRUMPACKRowLocMatrix::STRUMPACKRowLocMatrix(mfem::ComplexHypreParMatrix&)’).

Any suggestions on how to quickly modify the mfem::STRUMPACKSolver interface to take ComplexHypreParMatrix? Thanks!

Segfault on MPI HSS generation with laplace kernel.

Hello thank you for this library :)

I'm trying to generate a distributed HSS matrix using the following code but run into a segfault:
Code:

#include <cmath>
#include <iostream>

#include "dense/DistributedMatrix.hpp"
#include "HSS/HSSMatrixMPI.hpp"
#include "kernel/Kernel.hpp"
using namespace strumpack;
using namespace strumpack::HSS;

int main(int argc, char * argv[]) {
  MPI_Init(&argc, &argv);
  int N = atoi(argv[1]);

  HSSOptions<double> hss_opts;
  hss_opts.set_verbose(false);
  hss_opts.set_from_command_line(argc, argv);

  BLACSGrid grid(MPI_COMM_WORLD);
  // DistributedMatrix<double> A;

  DenseMatrix<double> vector(N, 1);  // data for a 1D laplace kernel
  for (int i = 0; i < N; ++i) {
    vector(i, 0) = i;
  }

  auto laplace1d = kernel::LaplaceKernel<double>(vector, 100, 1000);

  HSSMatrixMPI<double> HSS(laplace1d, &grid, hss_opts);

  scalapack::Cblacs_exit(1);
  MPI_Finalize();
}

Run script:

mpicxx -DMKL_ILP64  -m64  -I"${MKLROOT}/include" -fopenmp -g \
     -I/home/acb10922qh/gitrepos/STRUMPACK/build/include main.cpp /home/acb10922qh/gitrepos/STRUMPACK/build/lib/libstrumpack.a \
     -L${MKLROOT}/lib/intel64 -lmkl_scalapack_ilp64 -Wl,--no-as-needed -lmkl_intel_ilp64 -lmkl_gnu_thread -lmkl_core -lmkl_blacs_intelmpi_ilp64 -lgomp -lpthread -lm -ldl -o main
mpirun -n 2 ./main 1000

Error :

(base) [acb10922qh@g0119 strumpack]$ bash run.sh
WARNING: debug_mt library was used but no multi-ep feature was enabled. Please use debug library instead.
[g0119:5737 :0:5737] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x2000000018)
[g0119:5738 :0:5738] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x2000000018)
==== backtrace (tid:   5738) ====
 0 0x000000000004cb15 ucs_debug_print_backtrace()  ???:0
 1 0x000000000040acf4 strumpack::DenseMatrix<double>::cols()  /home/acb10922qh/gitrepos/STRUMPACK/build/include/dense/DenseMatrix.hpp:218
 2 0x000000000058d84c strumpack::binary_tree_clustering<double>()  ???:0
 3 0x0000000000503df3 strumpack::HSS::HSSMatrixMPI<double>::HSSMatrixMPI()  ???:0
 4 0x0000000000409b51 main()  /home/acb10922qh/gitrepos/learn-distributed-weak-admis/past-work/strumpack/main.cpp:28
 5 0x0000000000022445 __libc_start_main()  ???:0
 6 0x0000000000409939 _start()  ???:0
=================================
==== backtrace (tid:   5737) ====
 0 0x000000000004cb15 ucs_debug_print_backtrace()  ???:0
 1 0x000000000040acf4 strumpack::DenseMatrix<double>::cols()  /home/acb10922qh/gitrepos/STRUMPACK/build/include/dense/DenseMatrix.hpp:218
 2 0x000000000058d84c strumpack::binary_tree_clustering<double>()  ???:0
 3 0x0000000000503df3 strumpack::HSS::HSSMatrixMPI<double>::HSSMatrixMPI()  ???:0
 4 0x0000000000409b51 main()  /home/acb10922qh/gitrepos/learn-distributed-weak-admis/past-work/strumpack/main.cpp:28
 5 0x0000000000022445 __libc_start_main()  ???:0
 6 0x0000000000409939 _start()  ???:0
=================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 5737 RUNNING AT g0119
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 1 PID 5738 RUNNING AT g0119
=   KILLED BY SIGNAL: 11 (Segmentation fault)
===================================================================================

Hanging omp pragmas in BLRMatrixMPI.cpp and LLVM compilers

Hi, just wanted to apprise you of an issue I ran into using an LLVM based compiler (Intel).
In the file BLRMatrixMPI.cpp, there are a number of code segments with openmp pragmas such as

505 for (std::size_t j=i+1; j<B1; j++)
506 if (g->is_local_col(j) && adm(i, j))
507 #pragma omp task default(shared) firstprivate(i,j)
508 A11.compress_tile(i, j, opts);

This may be ok in gcc but in llvm this creates an issue where the pragma is the line following the if and line 508 is left orphaned from the for-if block. This creates a compiler error where the variable j is left undefined. There's probably about a 10-12 instances of this in this file and is fixed by doing something like

505 for (std::size_t j=i+1; j<B1; j++)
506 if (g->is_local_col(j) && adm(i, j)) {
507 #pragma omp task default(shared) firstprivate(i,j)
508 A11.compress_tile(i, j, opts); }

Once that's fixed, the code based compiles fine

invalid read with fortran example, dense package 1.1.1

I am using version 1.1.1, because it seems that the newest version has lost the Fortran support...

If I modify the example to use n = 169 and bs = 10` (which corresponds to a certain example I was using in my own code), I get the following valgrind error when I run it with one process:

==14914== Invalid read of size 8
==14914==    at 0x49B9B1: HSS_tree<std::complex<double>, double>::distMatrixTree(std::complex<double>*, int*, int, std::complex<double>**, int**, std::complex<double>**, int**, std::complex<double>**, int**, gridinfo*, int) (HSS_par_tools.hpp:298)
==14914==    by 0x4EE3B2: HSS_tree<std::complex<double>, double>::HSSR_par_Compress(std::complex<double>*, int*, double, int, int, int, int, int*, int*, gridinfo*, std::complex<double>*, std::complex<double>*, std::complex<double>*, std::complex<double>*, int*, void (*)(void*, int*, int*, std::complex<double>*, int*), bool) (HSSR_par_Compress.hpp:289)
==14914==    by 0x513059: StrumpackDensePackage<std::complex<double>, double>::compress(std::complex<double>*, int*) (StrumpackDensePackage_CPP.hpp:329)
==14914==    by 0x40D45A: SDP_C_dcomplex_compress_A (StrumpackDensePackage_C.cpp:479)
==14914==    by 0x40C47A: MAIN__ (in /opt/STRUMPACK-Dense-1.1.1/examples/f90_example)
==14914==    by 0x40BEEC: main (in /opt/STRUMPACK-Dense-1.1.1/examples/f90_example)
==14914==  Address 0x1c3f0308 is 8 bytes before a block of size 8 alloc'd
==14914==    at 0x4C2E80F: operator new[](unsigned long) (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==14914==    by 0x4EE2CC: HSS_tree<std::complex<double>, double>::HSSR_par_Compress(std::complex<double>*, int*, double, int, int, int, int, int*, int*, gridinfo*, std::complex<double>*, std::complex<double>*, std::complex<double>*, std::complex<double>*, int*, void (*)(void*, int*, int*, std::complex<double>*, int*), bool) (HSSR_par_Compress.hpp:283)
==14914==    by 0x513059: StrumpackDensePackage<std::complex<double>, double>::compress(std::complex<double>*, int*) (StrumpackDensePackage_CPP.hpp:329)
==14914==    by 0x40D45A: SDP_C_dcomplex_compress_A (StrumpackDensePackage_C.cpp:479)
==14914==    by 0x40C47A: MAIN__ (in /opt/STRUMPACK-Dense-1.1.1/examples/f90_example)
==14914==    by 0x40BEEC: main (in /opt/STRUMPACK-Dense-1.1.1/examples/f90_example)

This occurs twice for the process 0.

If I use 4 processes, the error happens both for process 2 and 3.

It does, however, seem that the program runs fine.

Do you know what the problem could be, and is this error reproducible?

All MPI tests fail

I am trying to run Strumpack on Mac OS 10.13 using clang+open-mpi+openblas.
I have also tried gcc and also accelerate instead of openblas. When performing 'make test' all the mpi tests fail for some reason.

Any tips on how to run strumpack on Mac are greatly appreciated!

This is the output from make test

56% tests passed, 82 tests failed out of 187

Total Test time (real) = 135.97 sec

The following tests FAILED:
	 22 - HSS_mpi_P13_Th1_ML_N500_L8_T1e-4_A1e-10_Senable_Coriginal_D016_DD8 (Failed)
	 23 - HSS_mpi_P13_Th1_ML_N500_L8_T1e-4_A1e-10_Sdisable_Coriginal_D016_DD8 (Failed)
	 24 - HSS_mpi_P13_Th1_ML_N500_L8_T1e-4_A1e-10_Senable_Cstable_D016_DD8 (Failed)
	 25 - HSS_mpi_P13_Th1_ML_N500_L8_T1e-4_A1e-10_Sdisable_Cstable_D016_DD8 (Failed)
	 26 - HSS_mpi_P19_Th1_MU_N10_L3_T1e-1_A1e-13_Senable_Cstable_D0128_DD8 (Failed)
	 27 - HSS_mpi_P4_Th1_MT_N200_L128_T1_A1e-13_Senable_Cstable_D032_DD4 (Failed)
	 29 - HSS_mpi_P13_Th1_MU_N500_L3_T1_A1e-10_Sdisable_Coriginal_D032_DD8 (Failed)
	 30 - HSS_mpi_P4_Th1_ML_N200_L1_T1e-5_A1e-10_Sdisable_Coriginal_D016_DD8 (Failed)
	 35 - HSS_mpi_P4_Th1_ML_N200_L16_T1e-1_A1e-13_Sdisable_Cstable_D016_DD8 (Failed)
	 37 - HSS_mpi_P17_Th1_MU_N200_L128_T1e-10_A1e-13_Senable_Cstable_D0128_DD8 (Failed)
	 39 - HSS_mpi_P16_Th1_MT_N500_L3_T1e-5_A1e-10_Senable_Coriginal_D0128_DD4 (Failed)
	 40 - HSS_mpi_P13_Th1_ML_N10_L1_T1e-1_A1e-13_Sdisable_Coriginal_D032_DD4 (Failed)
	 41 - HSS_mpi_P17_Th1_MT_N200_L3_T1e-1_A1e-13_Sdisable_Coriginal_D064_DD4 (Failed)
	 43 - HSS_mpi_P4_Th1_ML_N1_L1_T1_A1e-10_Sdisable_Cstable_D032_DD8 (Failed)
	 44 - HSS_mpi_P19_Th1_MT_N500_L3_T1e-5_A1e-13_Senable_Coriginal_D064_DD8 (Failed)
	 45 - HSS_mpi_P16_Th1_ML_N10_L1_T1e-1_A1e-13_Senable_Coriginal_D016_DD4 (Failed)
	 46 - HSS_mpi_P19_Th1_MU_N1_L3_T1e-5_A1e-10_Sdisable_Coriginal_D0128_DD8 (Failed)
	 47 - HSS_mpi_P16_Th1_MU_N500_L128_T1e-5_A1e-13_Senable_Cstable_D016_DD4 (Failed)
	 48 - HSS_mpi_P17_Th1_ML_N500_L128_T1e-10_A1e-13_Senable_Cstable_D064_DD4 (Failed)
	 49 - HSS_mpi_P13_Th1_ML_N200_L1_T1e-5_A1e-13_Senable_Coriginal_D032_DD8 (Failed)
	 50 - HSS_mpi_P9_Th1_MT_N1_L1_T1e-1_A1e-10_Sdisable_Coriginal_D064_DD4 (Failed)
	 51 - HSS_mpi_P13_Th1_MT_N200_L16_T1_A1e-13_Sdisable_Coriginal_D032_DD8 (Failed)
	 52 - HSS_mpi_P9_Th1_MU_N200_L16_T1e-1_A1e-13_Sdisable_Coriginal_D016_DD8 (Failed)
	 53 - HSS_mpi_P16_Th1_ML_N10_L128_T1e-5_A1e-13_Senable_Coriginal_D016_DD8 (Failed)
	 54 - HSS_mpi_P19_Th1_MT_N200_L16_T1e-10_A1e-13_Senable_Cstable_D016_DD4 (Failed)
	 55 - HSS_mpi_P16_Th1_MT_N10_L128_T1e-5_A1e-13_Sdisable_Coriginal_D016_DD4 (Failed)
	 56 - HSS_mpi_P9_Th1_MT_N200_L128_T1_A1e-10_Senable_Cstable_D016_DD8 (Failed)
	 58 - HSS_mpi_P13_Th1_ML_N200_L128_T1e-1_A1e-13_Sdisable_Cstable_D0128_DD8 (Failed)
	 59 - HSS_mpi_P16_Th1_ML_N1_L3_T1e-5_A1e-10_Senable_Coriginal_D064_DD4 (Failed)
	 60 - HSS_mpi_P4_Th1_MU_N500_L128_T1_A1e-13_Sdisable_Coriginal_D032_DD4 (Failed)
	 62 - HSS_mpi_P16_Th1_ML_N200_L128_T1e-5_A1e-10_Sdisable_Coriginal_D0128_DD4 (Failed)
	 63 - HSS_mpi_P4_Th1_ML_N500_L1_T1e-5_A1e-10_Sdisable_Coriginal_D032_DD4 (Failed)
	 64 - HSS_mpi_P13_Th1_ML_N1_L16_T1e-5_A1e-13_Senable_Cstable_D032_DD8 (Failed)
	 65 - HSS_mpi_P4_Th1_MU_N10_L128_T1e-10_A1e-10_Sdisable_Cstable_D032_DD4 (Failed)
	 66 - HSS_mpi_P13_Th1_ML_N200_L1_T1_A1e-13_Sdisable_Coriginal_D032_DD4 (Failed)
	 68 - HSS_mpi_P13_Th1_MT_N500_L3_T1_A1e-10_Senable_Coriginal_D016_DD8 (Failed)
	 69 - HSS_mpi_P19_Th1_MU_N1_L16_T1e-10_A1e-10_Senable_Coriginal_D032_DD4 (Failed)
	 70 - HSS_mpi_P17_Th1_MT_N200_L1_T1_A1e-13_Senable_Cstable_D0128_DD8 (Failed)
	 71 - HSS_mpi_P19_Th1_MT_N500_L1_T1e-10_A1e-10_Sdisable_Cstable_D0128_DD8 (Failed)
	 73 - HSS_mpi_P17_Th1_ML_N10_L16_T1_A1e-10_Senable_Coriginal_D0128_DD4 (Failed)
	 75 - HSS_mpi_P16_Th1_ML_N10_L128_T1e-10_A1e-10_Sdisable_Cstable_D016_DD4 (Failed)
	 76 - HSS_mpi_P17_Th1_MU_N10_L3_T1e-1_A1e-13_Sdisable_Coriginal_D016_DD4 (Failed)
	138 - SPARSE_mpi_P16_Th1_Mutm300_NDptscotch_Cenable_L8_T1e-5_A1e-10_D016_DD8_SEP25 (Failed)
	139 - SPARSE_mpi_P13_Th1_Mutm300_NDscotch_Cenable_L8_T1e-1_A1e-10_D016_DD8_SEP25 (Failed)
	140 - SPARSE_mpi_P16_Th1_Mutm300_NDparmetis_Cenable_L8_T1_A1e-10_D016_DD8_SEP25 (Failed)
	141 - SPARSE_mpi_P19_Th1_Mcavity16_NDmetis_Cenable_L8_T1_A1e-10_D016_DD8_SEP25 (Failed)
	142 - SPARSE_mpi_P4_Th1_Mcavity16_NDscotch_Cenable_L8_T1e-1_A1e-10_D016_DD8_SEP25 (Failed)
	143 - SPARSE_mpi_P13_Th1_Mcbuckle_NDscotch_Cenable_L8_T1_A1e-10_D016_DD8_SEP25 (Failed)
	144 - SPARSE_mpi_P4_Th1_Mcz10228_NDscotch_Cenable_L8_T1e-5_A1e-10_D016_DD8_SEP25 (Failed)
	145 - SPARSE_mpi_P16_Th1_Mcz10228_NDscotch_Cenable_L8_T1_A1e-10_D016_DD8_SEP25 (Failed)
	146 - SPARSE_mpi_P9_Th1_Mutm300_NDparmetis_Cenable_L8_T1_A1e-10_D016_DD8_SEP25 (Failed)
	147 - SPARSE_mpi_P19_Th1_Mcavity16_NDparmetis_Cenable_L8_T1e-5_A1e-10_D016_DD8_SEP25 (Failed)
	148 - SPARSE_mpi_P16_Th1_Msherman4_NDscotch_Cenable_L8_T1_A1e-10_D016_DD8_SEP25 (Failed)
	150 - SPARSE_mpi_P17_Th1_Mcz10228_NDparmetis_Cenable_L8_T1e-10_A1e-10_D016_DD8_SEP25 (Failed)
	151 - SPARSE_mpi_P13_Th1_Mt2dal_NDscotch_Cenable_L8_T1_A1e-10_D016_DD8_SEP25 (Failed)
	152 - SPARSE_mpi_P4_Th1_Mcavity16_NDptscotch_Cenable_L8_T1_A1e-10_D016_DD8_SEP25 (Failed)
	153 - SPARSE_mpi_P9_Th1_Mcbuckle_NDmetis_Cenable_L8_T1_A1e-10_D016_DD8_SEP25 (Failed)
	154 - SPARSE_mpi_P13_Th1_Mcz10228_NDptscotch_Cenable_L8_T1e-1_A1e-10_D016_DD8_SEP25 (Failed)
	155 - SPARSE_mpi_P9_Th1_Mcbuckle_NDptscotch_Cenable_L8_T1e-5_A1e-10_D016_DD8_SEP25 (Failed)
	156 - SPARSE_mpi_P9_Th1_Mrdb968_NDptscotch_Cenable_L8_T1e-10_A1e-10_D016_DD8_SEP25 (Failed)
	158 - SPARSE_mpi_P4_Th1_Mutm300_NDptscotch_Cenable_L8_T1_A1e-10_D016_DD8_SEP25 (Failed)
	160 - SPARSE_mpi_P9_Th1_Mrdb968_NDscotch_Cenable_L8_T1_A1e-10_D016_DD8_SEP25 (Failed)
	161 - SPARSE_mpi_P17_Th1_Mbcsstk28_NDscotch_Cenable_L8_T1e-1_A1e-10_D016_DD8_SEP25 (Failed)
	163 - SPARSE_mpi_P2_Th1_Mcavity16_NDscotch_Cenable_L8_T1e-1_A1e-10_D016_DD8_SEP25 (Failed)
	164 - SPARSE_mpi_P16_Th1_Mmesh3e1_NDparmetis_Cenable_L8_T1e-5_A1e-10_D016_DD8_SEP25 (Failed)
	165 - SPARSE_mpi_P9_Th1_Mmesh3e1_NDscotch_Cenable_L8_T1e-1_A1e-10_D016_DD8_SEP25 (Failed)
	166 - SPARSE_mpi_P16_Th1_Mrdb968_NDptscotch_Cenable_L8_T1e-5_A1e-10_D016_DD8_SEP25 (Failed)
	167 - SPARSE_mpi_P16_Th1_Msherman4_NDparmetis_Cenable_L8_T1e-5_A1e-10_D016_DD8_SEP25 (Failed)
	168 - SPARSE_mpi_P17_Th1_Mrdb968_NDparmetis_Cenable_L8_T1e-10_A1e-10_D016_DD8_SEP25 (Failed)
	169 - SPARSE_mpi_P17_Th1_Mbcsstk28_NDptscotch_Cenable_L8_T1_A1e-10_D016_DD8_SEP25 (Failed)
	170 - SPARSE_mpi_P4_Th1_Mt2dal_NDscotch_Cenable_L8_T1e-5_A1e-10_D016_DD8_SEP25 (Failed)
	171 - SPARSE_mpi_P4_Th1_Mbcsstk28_NDmetis_Cenable_L8_T1e-1_A1e-10_D016_DD8_SEP25 (Failed)
	173 - SPARSE_mpi_P4_Th1_Mcbuckle_NDscotch_Cenable_L8_T1e-10_A1e-10_D016_DD8_SEP25 (Failed)
	174 - SPARSE_mpi_P17_Th1_Mbcsstm08_NDparmetis_Cenable_L8_T1_A1e-10_D016_DD8_SEP25 (Failed)
	175 - SPARSE_mpi_P19_Th1_Mbcsstm08_NDparmetis_Cenable_L8_T1e-1_A1e-10_D016_DD8_SEP25 (Failed)
	177 - SPARSE_mpi_P17_Th1_Mcavity16_NDptscotch_Cenable_L8_T1e-1_A1e-10_D016_DD8_SEP25 (Failed)
	178 - SPARSE_mpi_P9_Th1_Mmesh3e1_NDptscotch_Cenable_L8_T1e-10_A1e-10_D016_DD8_SEP25 (Failed)
	179 - SPARSE_mpi_P17_Th1_Mcz10228_NDparmetis_Cenable_L8_T1e-5_A1e-10_D016_DD8_SEP25 (Failed)
	180 - SPARSE_mpi_P13_Th1_Mcz10228_NDscotch_Cenable_L8_T1e-1_A1e-10_D016_DD8_SEP25 (Failed)
	181 - SPARSE_mpi_P2_Th1_Mcbuckle_NDparmetis_Cenable_L8_T1_A1e-10_D016_DD8_SEP25 (Failed)
	183 - SPARSE_mpi_P19_Th1_Msherman4_NDptscotch_Cenable_L8_T1_A1e-10_D016_DD8_SEP25 (Failed)
	184 - SPARSE_mpi_P16_Th1_Mrdb968_NDptscotch_Cenable_L8_T1e-1_A1e-10_D016_DD8_SEP25 (Failed)
Errors while running CTest
make: *** [test] Error 8

multilib not respected

This is because of

STRUMPACK/CMakeLists.txt

Line 726 in c3ed7e7

LIBRARY DESTINATION lib ARCHIVE DESTINATION lib)

Please do not hardcode lib, instead use
include(GNUInstallDirs) and ${CMAKE_INSTALL_LIBDIR}

 * Final size of build directory: 137412 KiB (134.1 MiB)
 * Final size of installed tree:   37768 KiB ( 36.8 MiB)
                                                                                                                                                                                                                 Files matching a file type that is not allowed:
   usr/lib/libstrumpack.so
 * ERROR: sci-libs/STRUMPACK-6.3.1-r1::guru failed:
 *   multilib-strict check failed!

SparseSolverMPIDist for fortran uses

Hi Pieter! I am now trying using SparseSolverMPIDist in FORTRAN. I am wondering if STRUMPACK supports that? Since I didn't find a subroutine (Strumpack_set_distributed_csr_matrix) in src /fortran/ strumpack.f90. Does that mean there is no such a interface for FORTRAN users? Thank you.

[Feature Request] scikit-learn compatible GPR module

Hi @pghysels, @liuyangzhuan, and @xiaoyeli,

Following up on our discussion, we would like to explore the possibility of subclassing GaussianProcessRegressor from scikit-learn. The definition of the regressor is here.

The idea is to let users provide a kernel that conforms to the interface of sklearn.gaussian_process.kernels.Kernel, and then you are free to iterate the kernel over a list of submatrices of the overall kernel matrix for compression.

Let me know if there is anything that I can help here.

How to get the matrix output？

I use the testMMdoubleMPIDist64 and Modify the program, print the results of vextor(x).but results is wrong。I use one process and one thread。can you tell me that how to get the matrix output？thank you

The MPI_Abort() function was called after MPI_FINALIZE was invoked.

When I run a simple mpi program with strumpack I get the error bellow. That happens at the end of the program, when the MPI and BLACS stuff is closing.

I am on

$cat /etc/system-release-cpe 
cpe:/o:redhat:enterprise_linux:7.7:ga:server

I have compiled STRUMPACK with OpenMPI version 4.0.2 and gcc 9.2.0.

This is the program. It reads a matrix and a right hand side vector from the file system and prints a few of numbers.

strumpack_simple_solver.tar.gz

$ mpirun -np 4 ./main 250
0 0 values:
   0.001755 0.002898 0.002898 
   0.002021 0.003055 0.003793 
   0.001651 0.003699 0.004182 
0 0 sv alues:
   0.000000 
   0.000227 
   0.000874 
1 0 values:
   0.001333 0.002958 0.002958 
   0.001161 0.002424 0.004090 
   0.001038 0.002128 0.003365 
1 0 sv alues:
   0.001881 
   0.003179 
   0.004690 
0 1 values:
   0.004286 0.004700 0.004700 
   0.004352 0.004751 0.005017 
   0.004580 0.004918 0.005149 
1 1 values:
   0.005172 0.005261 0.005261 
   0.006489 0.006038 0.005841 
   0.005086 0.007656 0.006787 
*** The MPI_Abort() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[nv001.avitohol.acad.bg:59551] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** The MPI_Abort() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[nv001.avitohol.acad.bg:59550] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
*** The MPI_Abort() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[nv001.avitohol.acad.bg:59552] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
Primary job  terminated normally, but 1 process returned
a non-zero exit code. Per user-direction, the job has been aborted.
--------------------------------------------------------------------------
*** The MPI_Abort() function was called after MPI_FINALIZE was invoked.
*** This is disallowed by the MPI standard.
*** Your MPI job will now abort.
[nv001.avitohol.acad.bg:59553] Local abort after MPI_FINALIZE started completed successfully, but am not able to aggregate error messages, and not able to guarantee that all other processes were killed!
--------------------------------------------------------------------------
mpirun detected that one or more processes exited with non-zero status, thus causing
the job to be terminated. The first process to do so was:

  Process name: [[39622,1],1]
  Exit code:    1
--------------------------------------------------------------------------

Adding a matrix to an HSS compressed matrix or ULV factoriized matrix

Hello Pieter,

I want to solve a parabolic equation as a series of systems of the form:
(M / \sigma_j + A) u_{j+1} = M f_j + u_j / \sigma_j

A is a stiffness matrix that I have already compressed and solved a system Au = B before. A is dense, because my problem is non-local.
M is the matrix of mass. It is either very sparse or diagonal (Lumped matrix of mass).
\sigma_j is just some number. u_{j+1} is the unknown, while u_j was calculated previously. f_j is also known.

I need to solve the above equation repeatedly, so I am interested in doing the compression and possibly the factorization only once, rather than at each step.

So, is it possible in STRUMPACK to sum the ULV factorized form of A with either a sparse or diagonal matrix?
Or I should use the HSS compressed form of A?

For the case where M is diagonal, it should be trivial to sum with the compressed matrix, since the compression does not change the diagonal. I think that this function should work:

For the case when M is sparse, the M matrix is generated on the same mesh as the stiffness matrix A, so may be it is possible to use the same compression structure as A. M has about 5 nonzeroes per row.

A C++ question about the implement of "create_frontal_matrix" in "FrontFactory.cpp"

After I saw the source code of FrontFactory.cpp, I have a question about the pointer front:

Its type is FrontalMatrix, but it's reset by a derived class FrontalMatrixXXX.
I'm not so familiar with C++ but as I know, a base class pointer isn't able to access its derived class member.
So I can't understand here how could front visit members in its derived class(like F11_, F12_ ).
Could you teach me is there any C++ techniques used here? Thanks very much.

[Feature Request] Creating DistributedMatrixWrapper matrix with preexisting (non-strumpack) grid.

Hello Pieter,

In previous versions of STRUMPACK there was a DistributedMatrixWrapper would take the ctxt of the blacs grid and the sizes and descriptor of the matrix to create STRUPACKS DistributedMatrix class with preallocated data.

However the newer DistributedMatrixWrapper requires BLACSGrid class as an input. How can I instruct it to use an already created grid?

Like so:

/* Init workind 2D process grid */
blacs_get_( &i_negone, &i_zero, &ictxt );
blacs_gridinit_( &ictxt, "R", &nprow, &npcol );
blacs_gridinfo_( &ictxt, &nprow, &npcol, &myrow, &mycol );

Best Regards,
Dimitar Slavchev

Build error for Fortran

I tried to build an Fortran example in STRUMPACK example file. But I got the error like this:

[login-3 build]$ cmake ..
-- Configuring done
-- Generating done
-- Build files have been written to: /home/kunet.ae/100059846/strum/build
[login-3 build]$ make
[ 50%] Linking CXX executable fexample
/lib/../lib64/crt1.o: In function _start': (.text+0x24): undefined reference to main'
make[2]: *** [CMakeFiles/fexample.dir/build.make:126: fexample] Error 1
make[1]: *** [CMakeFiles/Makefile2:95: CMakeFiles/fexample.dir/all] Error 2
make: *** [Makefile:103: all] Error 2

How to use BLR compression to solve a system of equations?

BLR compresses fine, but it fails with:

terminate called after throwing an instance of 'std::invalid_argument'
  what():  Operation solve not supported for this type.
Aborted (core dumped)

Running the testStructured example produces the same error. The preconditioned solve isn't called for BLR at all.

I can see that there is a function:

      void solve(const std::vector<int>& P, DenseM_t& x) const {
        x.laswp(P, true);
        trsm(Side::L, UpLo::L, Trans::N, Diag::U, scalar_t(1.), *this, x, 0);
        trsm(Side::L, UpLo::U, Trans::N, Diag::N, scalar_t(1.), *this, x, 0);
      }

in the BLRMatrix class, but it is completely unclear to me what P should be here. I assume that x is the right hand side.

STRUMPACKKernel

Hello!

I am trying to run the example KernelRegression.py but I receive a ''Segmentation fault (core dumped)'' error. In particular, this is the output of the command ``python KernelRegression.py'':

Usage: python3 KernelRegression.py filename h lambda degree

'filename' should refer to 4 files:
filename_train.csv
filename_train_label.csv
filename_test.csv
filename_test_label.csv
h: kernel width
lambda: regularization parameter
degree: ANOVA kernel degree
n = 10000 d = 8 m = 1000
C++, creating kernel: n=10000, d=8 h=1.3 lambda=3.11
Segmentation fault (core dumped)
==========================
I can add also the following results for the command ``make test''

The following tests FAILED:
29 - SPARSE_seq_4 (Subprocess aborted)
30 - SPARSE_seq_5 (Subprocess aborted)
31 - SPARSE_seq_6 (Subprocess aborted)
35 - SPARSE_seq_10 (Subprocess aborted)
36 - SPARSE_seq_11 (Subprocess aborted)
37 - SPARSE_seq_12 (Subprocess aborted)
41 - SPARSE_seq_16 (Subprocess aborted)
42 - SPARSE_seq_17 (Subprocess aborted)
43 - SPARSE_seq_18 (Subprocess aborted)
47 - SPARSE_seq_22 (Subprocess aborted)
48 - SPARSE_seq_23 (Subprocess aborted)
49 - SPARSE_seq_24 (Subprocess aborted)
53 - SPARSE_seq_28 (Subprocess aborted)
54 - SPARSE_seq_29 (Subprocess aborted)
55 - SPARSE_seq_30 (Subprocess aborted)
59 - SPARSE_seq_34 (Subprocess aborted)
60 - SPARSE_seq_35 (Subprocess aborted)
61 - SPARSE_seq_36 (Subprocess aborted)
65 - SPARSE_seq_40 (Subprocess aborted)
66 - SPARSE_seq_41 (Subprocess aborted)
70 - SPARSE_seq_45 (Subprocess aborted)
71 - SPARSE_seq_46 (Subprocess aborted)
72 - SPARSE_seq_47 (Subprocess aborted)
Errors while running CTest
Makefile:91: recipe for target 'test' failed
make: *** [test] Error 8

=======================

Thank you very!

Stefano

Some subroutines are not clear ( in Fortran use) and can't be found in documentation

HI, Could you please explain the meaning of the below subroutine and corresponding arguments in fortran use, which can't be found in documentation.

call STRUMPACK_init_mt(S, STRUMPACK_DOUBLE, STRUMPACK_MT, 0, c_null_ptr, 1)

! use geometric nested dissection
ierr = STRUMPACK_reorder_regular(S, k, k, 1)

I only know very basisc of C++, so it is hard for me to call these classes of C++ in Fortran code, could you please give some eamples how the classed are called in C++ and fortran, respectively? Thanks.

(2) Another question is that I would like to ask if it is pissible I only set upper half matrix in the form of CSR, because it is symmetic.

An error occurred running test_HSS_seq in strumpack-3.1.1

Using./test_HSS_seq command run shows the following error:
——————————————————————————————————
Running with:
OMP_NUM_THREADS=4 ./test_HSS_seq T 1000
tol = 0.01
created H matrix of dimension 1000 x 1000 with 4 levels
compression succeeded!
rank(H) = 9
memory(H) = 1.135 MB, 14.1875% of dense
relative error = ||A-HI||_F/||A||_F = 0.0041072
absolute error = ||A-HI||_F = 0.195952
relative error = ||H0B0-A0B0||_F/||A0*B0||_F = 0.000820075
relative error = ||H0'B0-A0'B0||_F/||A0'B0||_F = 0.000820075
relative error = ||H1B1-A1B1||_F/||A1B1||_F = 0.000979003
relative error = ||H1'*B1-A1'*B1||_F/||A1'*B1||_F = 0.000979003
extracting individual elements, avg error = 7.53244e-05
Segmentation fault (core dumped)
————————————————————————————————————
When looking for errors, it was found that there was an error in the【 laswp() 】function，
Can you give me some Suggestions or examples to solve this problem?
thanks.

Make CUDA and MPI exclusive

Hello, maintainers.

I'm currently trying to integrate STRUMPACK CUDA support into PETSc and found that STRUMPACK_USE_MPI and STRUMPACK_USE_CUDA cannot be switched ON at the same time because nvcc can't process mpi.h which is included in /src/misc/Triplet.hpp.

I think these options should be handled as exclusive ones in CMakeLists.txt.

Spack installation failure

Installation using Spack has error

Using the commands

git clone https://github.com/spack/spack.git
. spack/share/spack/setup-env.sh
spack install strumpack

The output file is enclosed.
spack-build-out.txt

MPI Abort on BLR Matrix factorization.

I compiled STRUMPACK with intel MPI and am trying to run the testBLRMPI.cpp example. However, it fails with MPI abort and the following errors:

$ mpirun -n 1 ./testBLRMPI 
# compressing 1000 x 1000 Toeplitz matrix, with relative tolerance 0.0001
# ProcessorGrid2D: [1 x 1]
# from_ScaLAPACK done!
rank = 0 ABORTING!!!!!

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   RANK 0 PID 88883 RUNNING AT g0125
=   KILLED BY SIGNAL: 6 (Aborted)
===================================================================================

Here's the stacktrace when running it through GDB:

$ gdb --args ./testBLRMPI 
GNU gdb (GDB) Red Hat Enterprise Linux 7.6.1-110.el7
Copyright (C) 2013 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>...
Reading symbols from /home/acb10922qh/gitrepos/useful-tsubame-benchmarks/hmatrix-benchmarks/STRUMPACK-5.0.0/build/examples/testBLRMPI...(no debugging symbols found)...done.
(gdb) r
Starting program: /home/acb10922qh/gitrepos/useful-tsubame-benchmarks/hmatrix-benchmarks/STRUMPACK-5.0.0/build/examples/./testBLRMPI 
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
warning: File "/bb/apps/gcc/7.4.0/lib64/libstdc++.so.6.0.24-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "$debugdir:$datadir/auto-load:/usr/bin/mono-gdb.py".
To enable execution of this file add
	add-auto-load-safe-path /bb/apps/gcc/7.4.0/lib64/libstdc++.so.6.0.24-gdb.py
line to your configuration file "/home/acb10922qh/.gdbinit".
To completely disable this security protection add
	set auto-load safe-path /
line to your configuration file "/home/acb10922qh/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
	info "(gdb)Auto-loading safe path"
Detaching after fork from child process 89850.
# compressing 1000 x 1000 Toeplitz matrix, with relative tolerance 0.0001
# ProcessorGrid2D: [1 x 1]
# from_ScaLAPACK done!
rank = 0 ABORTING!!!!!

Program received signal SIGABRT, Aborted.
0x00002aaab5a96277 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.17-222.el7.x86_64 libibcm-41mlnx1-OFED.4.1.0.1.0.44100.x86_64 libibverbs-41mlnx1-OFED.4.4.1.0.0.44100.x86_64 libmlx4-41mlnx1-OFED.4.1.0.1.0.44100.x86_64 libmlx5-41mlnx1-OFED.4.4.0.1.7.44100.x86_64 libnl3-3.2.28-4.el7.x86_64 libpsm2-11.2.78-1.el7.x86_64 librdmacm-41mlnx1-OFED.4.2.0.1.3.44100.x86_64 librxe-41mlnx1-OFED.4.1.0.1.7.44100.x86_64 numactl-devel-2.0.9-7.el7.x86_64 ucx-1.4.0-1.44100.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) where
#0  0x00002aaab5a96277 in raise () from /lib64/libc.so.6
#1  0x00002aaab5a97968 in abort () from /lib64/libc.so.6
#2  0x0000000000409526 in abort_MPI(int*, int*, ...) ()
#3  0x00002aaaab7709fc in MPIR_Err_return_comm (comm_ptr=0x15ee0, fcname=0x15ee0 <Address 0x15ee0 out of bounds>, errcode=604577797) at ../../src/mpi/errhan/errutil.c:312
#4  0x00002aaaab3ca12a in PMPI_Bcast (buffer=0x15ee0, count=89824, datatype=6, root=-1, comm=-1431306624) at ../../src/mpi/coll/bcast/bcast.c:437
#5  0x000000000046d6e2 in void strumpack::MPIComm::broadcast_from<int>(std::vector<int, std::allocator<int> >&, int) const ()
#6  0x0000000000456869 in strumpack::BLR::BLRMatrixMPI<double>::factor(strumpack::DenseMatrix<bool> const&, strumpack::BLR::BLROptions<double> const&) ()
#7  0x0000000000456666 in strumpack::BLR::BLRMatrixMPI<double>::factor(strumpack::BLR::BLROptions<double> const&) ()
#8  0x0000000000409897 in main ()

error: # Geometric reordering only works on a simple 3 point wide stencil

I use STRUMPACK to solve the matrix system resulting from a spectral collocation method. and I got this error feedback.
Could you please give me some suggestions.

ERROR: nested dissection went wrong, ierr=1
# ERROR: Geometric reordering failed. 
# Geometric reordering only works on a simple 3 point wide stencil
# on a regular grid and you need to provide the mesh sizes.

Does STRUMPACK support sparse LDLT factorization for symmetric matrix?

From the Sparse Direct Solver section, it seems to use the general sparse LU factorization (when no compression is used). Is there a LDLT implementation for symmetric matrix (which would save ~half of compute&memory)? Thanks!

CSRGraph.hpp and -DSTRUMPACK_USE_MPI=OFF

Issue

When building with -DSTRUMPACK_USE_MPI=OFF, the build fails due to the following line:

STRUMPACK/src/sparse/CSRGraph.hpp

Line 38 in 634c6ba

#include "misc/MPIWrapper.hpp"

The "misc/MPIWrapper.hpp" header is included, but this fails on systems without MPI headers in the path. If MPI is installed and in the path but the user decides not to use it, compiling will work, but linking will not work since MPI is (of course) not linked against due to -DSTRUMPACK_USE_MPI=OFF.

Suggestion

Changing above line to check for STRUMPACK_USE_MPI like in other places in the code:

#if defined(STRUMPACK_USE_MPI)
#include "misc/MPIWrapper.hpp"
#endif

Result

I was able to build with -DSTRUMPACK_USE_MPI=OFF on a system with and without MPI installed after making the small change.

GPU Support

Hi,

I was wondering if GPU support (Nvidia/AMD) is on the roadmap?

Can I use C Interface of strumpack-2.1.0 to solving a linear system with a matrix?

These days ,I use strumpack-2.1.0 to solving a linear system with a matrix.But I use Fortran program ago.So I may write a conversion interface between C++ and Fortran.This is very diffcult for me.I find that there are C interface but only have problem of solving the 2D Poisson.May the C interface can solving a linear system with a matrix.And the C interface can be used by my fortran program.Because I write conversion interface between C and Fortran may feasible. Looking forward to your reply。

Transpose solve

Hi!

Does STRUMPACK support transposed solves without explicitly computing the transposed factorization?

Thanks!

Using the Dense matrix computation package wrong.

Hello, I used the Dense matrix computation package to solve a 3*3 matrix,A=[1,2,3;4,5,6;7,8,9],b=[1,1,1];and the result is wrong. Please tell me how can i do? Thanks.

HSS. Leaving uncompressed blocks and changing diagonal values.

Hello,

I have two questions:

Is it possible to designate blocks that are not to be compressed? Or, in other words, is it possible to compress only a part of a matrix and then solve a system of equations with it?

As an example I have a matrix with 5 columns that are overwritten with -1 and 0s. I could reorder them and corresponding rows and right hand side to the edge of the matrix and these block will not have low rank.

Is it possible to change the diagonal values of an HSS matrix?

Currently there is a shift function that can multiply the diagonal by a scalar. I need to add a diagonal matrix to the compressed matrix.

BLR compression solver gives wrong result

Hello PIeter,

I am trying to solve a system of linear algebraic equations with the BLR solver. I am using testBLR_seq as an example. I have successfully solved the same problem with the HSS solver with some accuracy. The matrix is not symmetric.

However the results that I receive are very different from the results obtained from a direct LU solver (dgesv from MKL). In the MKL result all values are ~10^{1} or below, while BLR gives me results that are between ~10^{6} to ~10^{12}.

I tried changing the rel_tol to a smaller value of 10^{-8} or even set all tiles to be non-compressed (i.e. adm.fill(false);). I always receive the wrong answer.

Here is the code of the solver part that I use:

// STRUMPACK headers
#include "dense/DenseMatrix.hpp"
#include "BLR/BLRMatrix.hpp"

using namespace std;
using namespace strumpack;
using namespace strumpack::BLR;

// my_size is matrix size; D is the matrix (col major order) and sv is the right hand side arrays
void my_solve(int my_size, double *D, double *sv)
{
    //////////////////////////////////////////////////////////
    // S T R U M P A C K    W O R K S
    //    HSSOptions<double> blr_opts;
    BLROptions<double> blr_opts;
    blr_opts.set_verbose(false);
    blr_opts.set_max_rank(my_size);

    blr_opts.set_rel_tol(1e-2);
    //blr_opts.set_abs_tol(1e-8);

    // OR manually define the tree
    strumpack::structured::ClusterTree tree(my_size);
    tree.refine(blr_opts.leaf_size());

    // Get a vector of tiles
    auto tiles=tree.template leaf_sizes<std::size_t>();

    //ADMISSIBILITY -- weak
    std::size_t nt = tiles.size();
    DenseMatrix<bool> adm(nt, nt);
    adm.fill(true);
    for (std::size_t t=0; t<nt; t++) {
        adm(t, t) = false;
    }
    std::vector<int> piv;

    // Initialize the matrices
    // rows = N, cols = n, Array = D, Lead Dimention(== cols) = N
    DenseMatrix<double> A = DenseMatrix<double>(my_size,my_size,D,my_size);
    DenseMatrix<double> B = DenseMatrix<double>(my_size,1,sv,my_size);

    // Compress the matrix into BLR. directly from constructor
    BLRMatrix<double> H(A, tiles, adm, piv, blr_opts);

    H.solve(piv, B);
    copy(B, sv, my_size);

}

This is the experiment matrix, right hand side and the result from MKL and BLR with $rel_tol = 10^{-2}$
BLR_test_1000.zip

Build error

Hello,
I am getting the following error while executing make (STRUMPACK-3.2.0 and 3.3.0):

[ 97%] Built target test_HSS_mpi
": internal error: ** The compiler has encountered an unexpected problem.
** Segmentation violation signal raised. **
Access violation or stack overflow. Please contact Intel Support for assistance.

I am using the following compilers mpiicc, mpiicpc, mpiifort with intel/19.0.4; impi/2019.4; mkl/2019.4 (same with intel/19.0.5)

I would appreciate any suggestion how to overcome this problem.

Issues with using distributed-memory parallel direct solver via PETSc

Hi STRUMPACK-devs,

I'm running into issues when trying to use STRUMPACK as a (sparse) direct solver for a relatively straightforward 3D Poisson problem written using PETSc (link).

When running with 1 MPI rank and using CPU-native datatypes, everything works as expected with the following options:

mpirun -np 1 ./poisson3d -dm_mat_type aij -dm_vec_type standard \
 -ksp_type gmres -pc_type lu \
 -pc_factor_mat_solver_type strumpack -mat_strumpack_verbose \
 -ksp_monitor -ksp_view -log_view

And I observe the following output: link.

However, the same program crashes when I run with 4 MPI ranks. Here's the error trace: link.

Could someone tell me if I'm making any mistakes when using STRUMPACK?

Thanks in advance!

Problem with blacs library

Hi,
I am trying to build Stumpack on a debian computer. Everything runs OK until the tests compilation that complains about missing blacs library:
/usr/bin/ld: CMakeFiles/test_HSS_mpi.dir/test_HSS_mpi.cpp.o: undefined reference to symbol 'Cblacs_exit'
//usr/lib/libblacs-openmpi.so.1: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make[2]: *** [test/test_HSS_mpi] Error 1
make[1]: *** [test/CMakeFiles/test_HSS_mpi.dir/all] Error 2
make[1]: *** Waiting for unfinished jobs....
/usr/bin/ld: CMakeFiles/test_sparse_mpi.dir/test_sparse_mpi.cpp.o: undefined reference to symbol 'Cblacs_exit'
//usr/lib/libblacs-openmpi.so.1: error adding symbols: DSO missing from command line

But Blacs was found by CMake during the configuration. If I simply add " -L/usr/lib -lblacs-openmpi" to the link.txt file, the compilation ends and all the tests are OK.

Using STRUMPACK on GPUs via PETSc

Hi STRUMPACK-devs,

I'd like to use STRUMPACK on GPUs via the PETSc interface. However, when I pass GPU-native matrices/vectors (either as CUDA-specific data types or as Kokkos data types whose default exec space is CUDA), the routine set_csr_matrix crashes, similar to the crash described in #50.

How do I pass matrices/vectors that already reside on the GPU to STRUMPACK via the PETSc interface?

I'd like to add that I was however, able to use GPUs when running the same program with CPU-native datatypes and letting STRUMPACK offload them to the GPU. However, this method of operation was quite slow when compared to other direct solvers like superlu-dist. The code for the problem is here (3D-poisson on a grid size of 64^3, averaged over 5 solves after a warm-up solve where the factorization is computed). Here are the detailed log files (available as links from the timing numbers) from both cases (run on a pair of QuadroP4000s), should that help:

Solver	Matrix/Vector types	Time, 1 GPU	Time, 2 GPUs
STRUMPACK-offload	aij, standard	1.6589e+01	1.3489e+01
Superlu-dist	aij-cusparse, cuda	5.6872e+00	4.9308e+00
Superlu-dist	aij-kokkos, kokkos	5.7394e+00	3.7963e+00

Missing header file

I believe the file src/dense/DistributedVector.hpp needs to be added to the install target located at the following line:

STRUMPACK/CMakeLists.txt

Line 381 in fd17d46

install(FILES

Application terminated with exit string :Floating point exception(signal 8)

Hi,
I'm solving a problem with the StrumpackSparseSolverMPIDist, when the program running serially ,it is ok. But with the same matrix equation solving with parallel process, the program terminated with the signal shown as the title.Thanks for help.

Ace.

pghysels / strumpack Goto Github PK

strumpack's People

Contributors

Stargazers

Watchers

Forkers

strumpack's Issues

cmake command

cmake command

output

output

Issue

Suggestion

Result

Recommend Projects

Recommend Topics

Recommend Org