marekandreas / elpa Goto Github PK

A scalable eigensolver for dense, symmetric (hermitian) matrices (fork of https://gitlab.mpcdf.mpg.de/elpa/elpa.git)

License: Other

Makefile 0.83% Shell 0.75% Python 1.37% M4 3.66% C 27.86% Perl 0.11% Fortran 56.70% Cuda 2.55% C++ 5.37% Assembly 0.64% NASL 0.01% Cython 0.13% CMake 0.01%

elpa's People

Contributors

Stargazers

Watchers

Forkers

eric-haibin-lin dmageelanl lhuedepohl sabyadk icamps mszpindler juntangc yizeyi18 dmejiar changhw

elpa's Issues

configure issue in finding cublasDgemm

I am trying to install ELPA 2021.05.001 using GNU C and Fortran compilers on a server with Intel Xeon Gold CPUs and NVIDIA V100 GPUs. For some reason the configure script is unable to find the library containing cublasDgemm even though it is able to detect "nvcc" as you can see in the screenshot attached. All the CUDA paths are clearly specified as you can see in the attached config.log
Unable to understand what am I missing here!
config.log

2021.05.002: configure still tests avx when avx is disabled by --disable-avx --disable-avx2

checking whether we can compile AVX512 gcc intrinsics in C... no
configure: error: Could not compile a test program with AVX512, adjust the C compiler or CFLAGS. Possibly (some of) the flags "  -mmmx -msse -msse2 -mssse3 -msse4.1 -msse4.2 " solve this issue
===>  Script "configure" failed unexpectedly.

config.log:

configure:13737: checking whether we can compile AVX512 gcc intrinsics in C
configure:13751: cc -c -fopenmp -O2 -pipe -fno-omit-frame-pointer  -mmmx -msse -msse2 -mssse3 -msse4.1 -msse4.2 -mavx -msse3 -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing  -fno-omit-frame-pointer -isystem /usr/local/include conftest.c >&5
conftest.c:47:17: error: always_inline function '_mm512_load_pd' requires target feature 'avx512f', but would be inlined into function 'main' that is compiled without support for 'avx512f'
   __m512d q1 = _mm512_load_pd(q);
                ^
conftest.c:47:17: error: AVX vector return of type '__m512d' (vector of 8 'double' values) without 'avx512f' enabled changes the ABI
conftest.c:48:17: error: always_inline function '_mm512_fmadd_pd' requires target feature 'avx512f', but would be inlined into function 'main' that is compiled without support for 'avx512f'
   __m512d y1 = _mm512_fmadd_pd(q1, q1, q1);
                ^
conftest.c:48:17: error: AVX vector argument of type '__m512d' (vector of 8 'double' values) without 'avx512f' enabled changes the ABI
4 errors generated.
configure:13751: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "elpa"
[...]
configure:13758: result: no
configure:13761: error: Could not compile a test program with AVX512, adjust the C compiler or CFLAGS. Possibly (some of) the flags "  -mmmx -msse -msse2 -mssse3 -msse4.1 -msse4.2 " solve this issue

OS: FreeBSD 13

Build breaks for no apparent reason

2021.05.002_bugfix breaks with this log.

Configure environment:
F77="gfortran10" FC="gfortran10" FFLAGS="-O -Wl,-rpath=/usr/local/lib/gcc10" FCFLAGS="-std=legacy -I/usr/local/include -I/disk-samsung/freebsd-ports/math/elpa/work/elpa-2021.05.002_bugfix -Isrc -Isrc/general -Wl,-rpath=/usr/local/lib/gcc10" MAKE=gmake ac_cv_path_PERL=/usr/local/bin/perl ac_cv_path_PERL_PATH=/usr/local/bin/perl PERL_USE_UNSAFE_INC=1 XDG_DATA_HOME=/disk-samsung/freebsd-ports/math/elpa/work XDG_CONFIG_HOME=/disk-samsung/freebsd-ports/math/elpa/work XDG_CACHE_HOME=/disk-samsung/freebsd-ports/math/elpa/work/.cache HOME=/disk-samsung/freebsd-ports/math/elpa/work PATH=/disk-samsung/freebsd-ports/math/elpa/work/.bin:/home/yuri/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin SHELL=/bin/sh CONFIG_SHELL=/bin/sh ADDR2LINE="/usr/local/bin/addr2line" AR="/usr/local/bin/ar" AS="/usr/local/bin/as" CPPFILT="/usr/local/bin/c++filt" GPROF="/usr/local/bin/gprof" LD="/usr/local/bin/ld" NM="/usr/local/bin/nm" OBJCOPY="/usr/local/bin/objcopy" OBJDUMP="/usr/local/bin/objdump" RANLIB="/usr/local/bin/ranlib" READELF="/usr/local/bin/readelf" SIZE="/usr/local/bin/size" STRINGS="/usr/local/bin/strings" CMAKE_PREFIX_PATH="/usr/local" CONFIG_SITE=/disk-samsung/freebsd-ports/Templates/config.site lt_cv_sys_max_cmd_len=524288

Configure arguments:

--disable-avx --disable-avx2 --disable-avx512  --disable-static  --disable-c-tests --without-mpi --disable-openmp --disable-sse --disable-sse-assembly --prefix=/usr/local ${_LATE_CONFIGURE_ARGS}

gcc-10
FreeBSD 13

Does amd GPU on multi-node still be tested？

Hi， the document says "multi-GPU runs on mutliple nodes have not been tested", does elpa is still the situation now? We want to use elpa on an AMD GPU cluster, can elpa do it now?

"cpp -P" breaks with clang-12

Due to -Werror we are getting these errors:

cpp: error: cpp: error: argument unused during compilation: '-P' [-Werror,-Wunused-command-line-argument]

Log: http://beefy18.nyi.freebsd.org/data/main-amd64-default/pd83453f82622_sd5c1296234/logs/elpa-2019.05.002_2.log (IPv6 URL)

Could improve the compatiblity of elpa on AMD Zen4 cpus machines

Dear Sir/Madam.
Calculation of Bands implemented in QE AMD Zen4 cpus with Intel classic fortran reports errors like below.

I don't know how to solve the bug.
Could you give some suggestion?

Issues when compiling ELPA in LUMI with CRAY compilers.

Hello, I would like to report three different issues found when compiling ELPA ( 2022.11.001 ) in the LUMI supercomputer.
I have successfully managed to compile and run with the libraries (both with and without AMD gpu support) after addressing those appropriately.

`elpa_impl_math_template.F90 does not conform to FORTRAN standards and thus cray compilers fail. The issue lies in the following lines:

elpa/src/elpa_impl_math_template.F90

Lines 889 to 896 in bea1f0f

 #ifdef COMPLEXCASE 

 #ifdef DOUBLE_PRECISION_COMPLEX 

 & !bind(C, name="elpa_solve_tridiagonal_dc") 

 #endif 

 #ifdef SINGLE_PRECISION_COMPLEX 

 & !bind(C, name="elpa_solve_tridiagonal_fc") 

 #endif 

 #endif

According to GNU standards, there cannot be a comment-only line within a line continuation. While I think that Cray is being very pedantic about it, changing those lines to something like the following fixes the issue:

    subroutine elpa_solve_tridiagonal_&
                    &ELPA_IMPL_SUFFIX&
#ifdef REALCASE
#ifdef DOUBLE_PRECISION_REAL
                    &_c(handle, d_p, e_p, q_p, error) bind(C, name="elpa_solve_tridiagonal_d")
#endif
#ifdef SINGLE_PRECISION_REAL
                    &_c(handle, d_p, e_p, q_p, error) bind(C, name="elpa_solve_tridiagonal_f")
#endif
#endif
#ifdef COMPLEXCASE
#ifdef DOUBLE_PRECISION_COMPLEX
                    &_c(handle, d_p, e_p, q_p, error) !bind(C, name="elpa_solve_tridiagonal_dc")
#endif
#ifdef SINGLE_PRECISION_COMPLEX
                    &_c(handle, d_p, e_p, q_p, error) !bind(C, name="elpa_solve_tridiagonal_fc")
#endif
#endif

C and C++ tests should be disabled. I was not able to gather more information on these, but in case you need it, the errors I found are as follows:

CPP tests fail to compile with a “source file is not valid UTF-8 fatal error” It seems as if the compiler was trying to re-compile from the object files instead of the source *.cpp files.
C tests fail with this same error in all of the validate_* files, probably due to a linking issue. :

ld.lld: error: ./.libs/libelpa.so: undefined reference to .omp_offloading.img_size.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined]
ld.lld: error: ./.libs/libelpa.so: undefined reference to .omp_offloading.img_cache.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined]
ld.lld: error: ./.libs/libelpa.so: undefined reference to .omp_offloading.img_start.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined]
ld.lld: error: ./.libs/libelpa.so: undefined reference to .omp_offloading.img_size.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined]
ld.lld: error: ./.libs/libelpa.so: undefined reference to .omp_offloading.img_cache.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined]
ld.lld: error: ./.libs/libelpa.so: undefined reference to .omp_offloading.img_start.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined]

Last but not least, when loading certain combinations of modules, libtool adds two dangling "-L" at the end of the postdeps variable. For example:

postdeps="-lc -lcsup -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lgcc -lclang_rt.builtins-x86_64 -L -L"

This results in a compiler crash saying that "-L" cannot be empty. I could not backtrace the whole thing, but apparently something happens when configure is parsing the environment.

My solution for now is to remove those extra "-L" from libtool after the configure step is done.

If I can provide any additional information on this, please do let me know!

Problem With mpiifort and mpiicc

cd elpa-master
./autogen.sh
mkdir build
cd build
../configure FC=mpiifx CC=mpiicx --prefix=/opt/elpa FCFLAGS="-O3 -xAVX2" CFLAGS="-O3 -xAVX2" --enable-option-checking=fatal SCALAPACK_LDFLAGS="-L/home/aarav/intel/mkl/2024.0/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread " SCALAPACK_FCFLAGS="-I/home/aarav/intel/mkl/2024.0/lib/intel64/lp64" --disable-avx512 --with-mpi=yes
error like this :- /usr/bin/ld: ./.libs/libelpatest.a(libelpatest_la-test_analytic.o): undefined reference to symbol 'mpi_gather_'
/usr/bin/ld: /home/aarav/intel/mpi/2021.11/lib/libmpifort.so.12: error adding symbols: DSO missing from command line
collect2: error: ld returned 1 exit status
make[1]: *** [Makefile:72450: validate_cpp_version_complex_double_eigenvalues_1stage_analytic] Error 1
make[1]: Leaving directory '/home/aarav/Videos/elpa-master/build'
make: *** [Makefile:68494: all] Error 2

log file attach
config.log

Cannot build elpa-2022.11.001.rc1 with CUDA

I downloaded elpa-2022.11.001.rc1 and tried the following:

cd elpa
mkdir build
cd build
../configure FC=ftn CC=cc CXX=CC --enable-nvidia-gpu --with-cuda-path=$CUDA_HOME --with-NVIDIA-GPU-compute-capability=sm_80 --disable-sse-assembly --disable-sse
make -j4

Ran into this error:

../src/elpa_api.F90:1117:2: fatal error: ./GPU/handle_destruction_template.F90: No such file or directory

Could it be that somebody forgot to commit the file GPU/handle_destruction_template.F90? I guess this also indicates a problem of the CI.

ELPA 2021.05.001.rc1: ELPA_2STAGE_REAL_GPU is missing if configured with --enable-gpu=yes

It is acknowledged that --enable-gpu=yes is legacy (given that other GPUs are starting to gain support in ELPA). However, it would be nice to carry-forward the previous set of PARAMETER values (or enumeration values) if a legacy configuration is still allowed. For Fortran code, the module can be used or the header file (constants) can be included. For the latter, one can check if for instance ELPA_2STAGE_REAL_GPU works (#if defined(ELPA_2STAGE_REAL_GPU)), but this is not straight-forward or possible with with a pure module file approach.

This report is also about missing for instance ELPA_2STAGE_REAL_GPU if --enable-gpu=no is given at configuration time. With the 2020-release of ELPA, this apparently worked, i.e., ELPA_2STAGE_REAL_GPU was part of the Fortran module. I wonder how this is ever valid for cross-compiled module files. Should support for an ELPA module file be dropped or should a request for ELPA_2STAGE_REAL_GPU be redirected to something else (fallback)?

This is an ask to keep the ELPA interface as stable as possible or to change it less frequently, or to (alternatively/finally) remove exposing internals of ELPA as part of the installed bits ("interface"). If for instance elpa_constants.h are exposed, they should be rather stable or not exist at all.

This issue can be of interest for the (upcoming) CP2K 8.2 (@mkrack @oschuett @dev-zero @alazzaro).

configure can't link to cublas

config.log
Trying to build elpa-2021.05.002 for my personal machine(Ubuntu 22.04 LTS) with Intel CORE(R) i7-9750H and NVIDIA GeForce GTX 1650.
CUDA has been installed in /usr/local/cuda and nvcc --version works perfectly. /usr/local/cuda/lib64/libcublas.so also exists. But configure error can't link to cublas in the file. Please see attached config.log
Edit: attached screenshot

Thanks in advance

Error in gcc < 7.4: initializer element is not constant

I get a compiler error in file elpa_index.c using gcc 7.3

#ifdef WITH_AMD_GPU_VERSION
#define default_max_stored_rows 256
#else
int const default_max_stored_rows = 256;
#endif

Old compiler versions don't treat this kind of definitions as constants and you can't use these variables to initialize structures at compile time. Using a "#define" instead of a " int const" will work for any compiler.

How to compile with AMD GPU and MPI?

Hi, can ELPA be compiled with AMD GPU and MPI? I only find the way that compile mpi with CUDA in the INSTALL.md.

Issues in the bandred routines

I am one of the CP2K developers and I am currently attempting to enforce block sizes of powers of 2 whenever we employ ELPA to solve eigenvalue problems (see cp2k/cp2k#2407). The code works fine on CPU but not on GPU, where ELPA occasionally throws

ELPA2: bandred returned an error. Aborting...
Problem getting option for debug settings. Aborting...

It happens with different kinds of tests repeatedly. We suppose that ELPA was never run on GPU in these cases.

Compile error: generated manually_preprossessed file has Syntax error

Hi, I compile elpa with intel mpi, intel fortran(2021) and gcc\g++(11.2), and meet following error

make[1]: Entering directory '/work1/jrf/tool/elpa/build-mpi'
  PPFC     src/libelpa_public_la-elpa_constants.lo
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(67): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(69): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(71): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(73): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(75): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(77): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(79): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(81): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(83): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(85): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(87): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(89): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(91): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(95): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(99): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(102): error #5276: Unbalanced parentheses
) 
^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(102): error #5082: Syntax error, found ')' when expecting one of: <LABEL> <END-OF-STATEMENT> ; <IDENTIFIER> TYPE MODULE ELEMENTAL IMPURE NON_RECURSIVE ...
) 
^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(105): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(107): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(109): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(111): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(113): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(115): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(117): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(119): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(121): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(123): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(125): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(127): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(129): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
 integer(kind=C_INT), parameter :: 
-----------------------------------^
/tmp/ifort9zzC8J.i90(133): catastrophic error: Too many errors, exiting

what could cause this ?

compile command

 FC=mpiifort  CC=mpicc CXX=mpicxx ../configure \
 FCFLAGS="-O3 -march=core-avx2 " \
 CFLAGS=" -O3 -mavx2 -mfma -funsafe-loop-optimizations -funsafe-math-optimizations -ftree-vect-loop-version -ftree-vectorize"  CXXFLAGS="-O3 -mfma -funsafe-loop-optimizations -funsafe-math-optimizations -ftree-vect-loop-version -ftree-vectorize -mavx2 " \
 --enable-option-checking=fatal \
 SCALAPACK_LDFLAGS=" -L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread " \
 SCALAPACK_FCFLAGS="-I$MKL_HOME/include  -I$MKL_HOME/include/intel64/lp64 -L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread " \
 --enable-avx2 --disable-avx512-kernels

Checking configure option to disable non-implemented gpuBLAS with redistribute matrix

According to src/elpa1/elpa1_template.F90, line 386, gpuBLAS with redistribute matrix is not implemented; trying to use so would result in memcopy issue and fall in segmentation fault. Is it possible checking configure option in release to disable that at configure stage instead of during testing?

Using other datatypes than double on GPUs

I have been successful in using ELPA2 on NVIDIA GPU for the type "double". However the same test code fails for other data types such as "float" with an error
Assertion `error_elpa==ELPA_OK' failed
after the call to
elpa_eigenvectors_float(handle, a, ev, z, &error_elpa);
My understanding is that the only difference between the "double" version and the "float" version is the datatype of a, ev, and z, which should match the function call. But maybe I am missing something else?
I tried to use data (a, ev, z) on CPU and GPUs, with the same result.

Installaition with spack is not complete

The following error is displayed when trying spack install elpa %[email protected] ^[email protected] ^intel-mkl threads=openmp

1 error found in build log:
     220    configure: WARNING:  * allow ELPA at runtime to change the number of threads to 1 by setting "--enable-runtime-threading-support-checks
     221    configure: WARNING:     --enable-allow-thread-limiting --without-threading-support-check-during-build": this will ensure correct results, but
     222    configure: WARNING:     maybe not the best performance (depends on the threading of your blas/lapack libraries), see the USER_GUIDE
     223    configure: WARNING:  * switch of the checking of threading support "--disable-runtime-threading-support-checks
     224    configure: WARNING:    --without-threading-support-check-during-build: DO THIS AT YOUR OWN RISK! This will be fast, but might
     225    configure: WARNING:    (depending on your MPI library sometimes) lead to wrong results
  >> 226    configure: error: You do have to take an action of the choices above!

complex_double_eigenvectors_2stage check fails with AMD gpu stream enabled

With --enable-gpu-streams=amd configure flag, validate_complex_double_eigenvectors_2stage_default_kernel_gpu_analytic_default.sh --- or say, all 2 stage gpu kernel --- fails with "Invalid DeviceId less than 0" from hip runtime.

backtrace is like:

Backtrace from running ABACUS

:0:/tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_fatbin.hpp:50  : 614193288254 us: [pid:509108 tid:0x155546b83cc0] Invalid DeviceId less than 0

Thread 1 "abacus" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=23456002882752) at ./nptl/pthread_kill.c:44
44      ./nptl/pthread_kill.c: 没有那个文件或目录.
(gdb) bt
#0  __pthread_kill_implementation (no_tid=0, signo=6, threadid=23456002882752) at ./nptl/pthread_kill.c:44
#1  __pthread_kill_internal (signo=6, threadid=23456002882752) at ./nptl/pthread_kill.c:78
#2  __GI___pthread_kill (threadid=23456002882752, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3  0x0000155551842476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4  0x00001555518287f3 in __GI_abort () at ./stdlib/abort.c:79
#5  0x00001555033018e6 in hip::FatBinaryInfo::DeviceIdCheck (device_id=device_id@entry=-1, this=<optimized out>)
    at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_fatbin.hpp:50
#6  hip::FatBinaryInfo::BuildProgram (this=<optimized out>, device_id=device_id@entry=-1)
    at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_fatbin.cpp:335
#7  0x000015550330576e in hip::Function::getStatFunc (this=0x5555564bffc0, hfunc=hfunc@entry=0x7fffffff7b18, deviceId=deviceId@entry=-1)
    at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_global.cpp:132
#8  0x00001555032be473 in hip::StatCO::getStatFunc (this=0x555555e25cd0, hfunc=hfunc@entry=0x7fffffff7b18,
    hostFunction=hostFunction@entry=0x155555435378 <my_unpack_c_hip_kernel_complex_double(int, int, int, int, int, HIP_vector_type<double, 2u>*, HIP_vector_type<double, 2u>*, int)>,
    deviceId=deviceId@entry=-1) at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_code_object.cpp:848
#9  0x000015550344b54c in PlatformState::getStatFunc (this=<optimized out>, hfunc=hfunc@entry=0x7fffffff7b18,
    hostFunction=hostFunction@entry=0x155555435378 <my_unpack_c_hip_kernel_complex_double(int, int, int, int, int, HIP_vector_type<double, 2u>*, HIP_vector_type<double, 2u>*, int)>,
    deviceId=deviceId@entry=-1) at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_platform.cpp:858
#10 0x000015550344b5a8 in ihipLaunchKernel (hostFunction=0x155555435378 <my_unpack_c_hip_kernel_complex_double(int, int, int, int, int, HIP_vector_type<double, 2u>*, HIP_vector_type<double, 2u>*, int)>,
    gridDim=..., blockDim=..., args=0x7fffffff8080, sharedMemBytes=0, stream=0x1555036aa180 <vtable for hip::Stream+16>, startEvent=0x0, stopEvent=0x0, flags=0)
    at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_platform.cpp:568
#11 0x0000155503421cb2 in hipLaunchKernel_common (hostFunction=<optimized out>,
    hostFunction@entry=0x155555435378 <my_unpack_c_hip_kernel_complex_double(int, int, int, int, int, HIP_vector_type<double, 2u>*, HIP_vector_type<double, 2u>*, int)>, gridDim=..., blockDim=...,
    args=<optimized out>, args@entry=0x7fffffff8080, sharedMemBytes=<optimized out>, stream=<optimized out>)
    at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_module.cpp:672
#12 0x000015550342c333 in hipLaunchKernel (hostFunction=0x155555435378 <my_unpack_c_hip_kernel_complex_double(int, int, int, int, int, HIP_vector_type<double, 2u>*, HIP_vector_type<double, 2u>*, int)>,
    gridDim=..., blockDim=..., args=<optimized out>, sharedMemBytes=<optimized out>, stream=<optimized out>)
    at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_module.cpp:679
#13 0x00001555552391c9 in launch_my_unpack_c_hip_kernel_complex_double (row_count=<optimized out>, n_offset=<optimized out>, max_idx=<optimized out>, stripe_width=<optimized out>, a_dim2=<optimized out>,
    stripe_count=<optimized out>, l_nev=<optimized out>, row_group_dev=<optimized out>, a_dev=<optimized out>, my_stream=<optimized out>) at ../src/GPU/ROCm/hipUtils_template.cpp:356
#14 0x000015555514838c in gpu_c_kernel::launch_my_unpack_gpu_kernel_complex_double (row_count=<error reading variable: Cannot access memory at address 0x4d5f3241444d5f32>, n_offset=0, max_idx=615,
    stripe_width=1024, a_dim2=2880, stripe_count=<error reading variable: Cannot access memory at address 0x5756524c5f314c44>, l_nev=<optimized out>, row_group_dev=<optimized out>, a_dev=<optimized out>,
    my_stream=<optimized out>) at ../src/elpa2/GPU/interface_c_gpu_kernel.F90:376
#15 0x0000155555149534 in pack_unpack_gpu::unpack_row_group_complex_gpu_double (obj=..., row_group_dev=23447635755008, a_dev=23437523288064, stripe_count=1, stripe_width=1024, last_stripe_width=615,
    a_dim2=2880, l_nev=615, rows=<error reading variable: value requires 629760 bytes, which is more than max-value-size>, n_offset=0, row_count=64, wantdebug=.FALSE., allcomputeongpu=.FALSE.,
    my_stream=93825182226512) at ../src/elpa2/pack_unpack_gpu.F90:362
#16 0x000015555514a84d in pack_unpack_gpu::unpack_and_prepare_row_group_complex_gpu_double (obj=..., row_group=<error reading variable: value requires 629760 bytes, which is more than max-value-size>,
    row_group_dev=23447635755008, a_dev=23437523288064, stripe_count=1, stripe_width=1024, last_stripe_width=615, a_dim2=2880, l_nev=615, row_group_size=64, nblk=64, unpack_idx=64, next_unpack_idx=65,
    force=.FALSE., wantdebug=.FALSE., allcomputeongpu=.FALSE., my_stream=93825182226512) at ../src/elpa2/pack_unpack_gpu.F90:429
#17 0x00001555550e5c21 in elpa2_compute::trans_ev_tridi_to_band_complex_double (obj=..., na=2816, nev=615, nblk=64, nbw=64, q=..., ldq=2816, matrixcols=2816,
    hh_trans=<error reading variable: value requires 64880640 bytes, which is more than max-value-size>, my_pe=0, mpi_comm_rows=14, mpi_comm_cols=15, wantdebug=.FALSE., usegpu=.TRUE., max_threads_in=1,
    success=.TRUE., kernel=23) at ../src/elpa2/elpa2_trans_ev_tridi_to_band_template.F90:1140
#18 0x000015555519af5e in elpa2_impl::elpa_solve_evp_complex_2stage_a_h_a_double_impl (obj=..., aextern=..., evextern=..., qextern=...) at ../src/elpa2/elpa2_template.F90:1403
#19 0x000015555504f64d in elpa_impl::elpa_eigenvectors_a_h_a_dc (self=..., a=..., ev=..., q=..., error=32767) at ../src/elpa_impl_math_solvers_template.F90:126
#20 0x0000155555058d0b in elpa_impl::elpa_eigenvectors_a_h_a_dc_c (handle=<optimized out>, a_p=<optimized out>, ev_p=<optimized out>, q_p=<optimized out>, error=32767)
    at ../src/elpa_impl_math_solvers_template.F90:333
#21 0x0000555555aed081 in ELPA_Solver::eigenvector(std::complex<double>*, double*, std::complex<double>*) ()
#22 0x0000555555aee25f in ELPA_Solver::generalized_eigenvector(std::complex<double>*, std::complex<double>*, int&, double*, std::complex<double>*) ()
#23 0x0000555555817aac in hsolver::DiagoElpa<std::complex<double> >::diag(hamilt::Hamilt<std::complex<double>, psi::DEVICE_CPU>*, psi::Psi<std::complex<double>, psi::DEVICE_CPU>&, double*) ()
#24 0x000055555580fd51 in hsolver::HSolverLCAO<std::complex<double>, psi::DEVICE_CPU>::hamiltSolvePsiK(hamilt::Hamilt<std::complex<double>, psi::DEVICE_CPU>*, psi::Psi<std::complex<double>, psi::DEVICE_CPU>&, double*) ()
--Type <RET> for more, q to quit, c to continue without paging--
#25 0x000055555581174c in hsolver::HSolverLCAO<std::complex<double>, psi::DEVICE_CPU>::solveTemplate(hamilt::Hamilt<std::complex<double>, psi::DEVICE_CPU>*, psi::Psi<std::complex<double>, psi::DEVICE_CPU>&, elecstate::ElecState*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) ()
#26 0x0000555555811f85 in hsolver::HSolverLCAO<std::complex<double>, psi::DEVICE_CPU>::solve(hamilt::Hamilt<std::complex<double>, psi::DEVICE_CPU>*, psi::Psi<std::complex<double>, psi::DEVICE_CPU>&, elecstate::ElecState*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) ()
#27 0x000055555592741c in ModuleESolver::ESolver_KS_LCAO<std::complex<double>, std::complex<double> >::hamilt2density(int, int, double) ()
#28 0x00005555558e4e23 in ModuleESolver::ESolver_KS<std::complex<double>, psi::DEVICE_CPU>::Run(int, UnitCell&) ()
#29 0x00005555557bf514 in Relax_Driver<double, psi::DEVICE_CPU>::relax_driver(ModuleESolver::ESolver*) ()
#30 0x00005555557d1c99 in Driver::driver_run() ()
#31 0x00005555557d0ee5 in Driver::atomic_world() ()
#32 0x00005555557d1702 in Driver::init() ()
#33 0x00005555555ad1e4 in main ()

Undefined variable _XOR_EPI breaks builds with AVX512 on non Intel CPUs

Hi,
I found this issue when I was building ELPA on AMD zen4 architecture with AVX512. The build broke with multiple errors like:

src/elpa2/kernels/complex_128bit_256bit_512bit_BLOCK_template.c:2574:37: warning: implicit declaration of function '_XOR_EPI' [-Wimplicit-function-declaration]
 2574 |         h1_real = (__SIMD_DATATYPE) _XOR_EPI((__m512i) h1_real, (__m512i) sign);
      |                                     ^~~~~~~~
src/elpa2/kernels/complex_128bit_256bit_512bit_BLOCK_template.c:2574:9: error: cannot convert a value of type 'int' to vector type '__vector(8) double' which has different size
 2574 |         h1_real = (__SIMD_DATATYPE) _XOR_EPI((__m512i) h1_real, (__m512i) sign);

According to 'configure.ac', either 'HAVE_AVX512_XEON' or 'HAVE_AVX512_XEON_PHI' macro should automatically be defined if CPU has an AVX512 support. This works on Intel CPUs, however, on AMD CPUs the configure script ill-defines HAVE_AVX512_XEON_PHI and leaves HAVE_AVX512_XEON undefined. This leads to execution of the code path guarded by the #ifdef HAVE_AVX512_XEON_PHI directive, in which an undefined macro _XOR_EPI is used. I'm sure that the same error will also appear on Xeon-Phi as the logic similar.

If I understood correctly, the _XOR_EPI macro should actually be replaced with _SIMD_XOR_EPI that correctly defines _mm512_xor_epi64 for AVX512 registers.

I'm attaching the patch with the fix to this issue (created from ELPA-2023.05.001):
ELPA-2023.05.001_fix_AVX512_support.patch

Benchmarking eigen vector computation

I'm interested in benchmarking elpa's performance on computing the eigen vector of a matrix. I assume this (https://github.com/marekandreas/elpa/blob/master/python/examples/example.py) is the hello world program I should start with? If I want to use GPU, how do I modify this program for cuda matrix input? Thanks!

Build of C/C++ tests with CUDA failes for 2022.11.001.rc2

Building the latest 2022.11.001.rc2 release with --enable-nvidia-gpu=yes fails with the following error message:

../test/C/test.c:171:10: fatal error: test/shared/GPU/test_gpu_vendor_agnostic_layerVariables.h: No such file or directory
  171 | #include "test/shared/GPU/test_gpu_vendor_agnostic_layerVariables.h"
      |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

With --disable-c-tests --disable-cpp-tests the build succeeds.

make error when use cuda support

The system environment is CentOS7 and i'm using intel oneapi toolkit and devtoolset-9, so the complier is latest ifortran+gcc9.
As for hardware, the cpu supports avx512 and the GPU is Nvidia A100

It can be compiled successfully without cuda support.
FC=mpiifort CC=mpiicc ../configure FCFLAGS="-O3 -xCORE-AVX512" CFLAGS="-O3 -xCORE-AVX512" --enable-option-checking=fatal SCALAPACK_LDFLAGS=" -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl" SCALAPACK_FCFLAGS=" -I"${MKLROOT}/include"" --enable-avx2 --enable-avx512

However, if I add gpu support:
../configure FC=mpiifort CC=mpicc FCFLAGS="-O3 -xCORE-AVX512" CFLAGS="-O3 -march=skylake-avx512 -mfma -funsafe-loop-optimizations -funsafe-math-optimizations -ftree-vect-loop-version -ftree-vectorize" SCALAPACK_LDFLAGS=" -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl" SCALAPACK_FCFLAGS=" -qmkl=parallel" --enable-avx2 --enable-avx512 --enable-nvidia-gpu --with-cuda-path=/usr/local/cuda --with-NVIDIA-GPU-compute-capability=sm_80

I got this:

Pre-built ELPA package for debian

the repositories of your Linux distribution: there exist pre-build packages for a number of Linux distributions like Fedora, Debian, and OpenSuse

I was searching for elpa in the debian distribution packages, but I was not able to find it. What is the package name? Am I supposed to install it with sudo apt-get install pyelpa ? Thanks!

nvidia-gpu cannot be set by exporting an environment variable in bash

According to the USERS_GUIDE:

The user can change this default value by setting an enviroment variable to the desired value.

The name of this variable is always constructed in the following way:

ELPA_DEFAULT_tunable_parameter_name=value

(By the way environment is misspelled as enviroment.)

However, this doesn't seem to work well with the nvidia-gpu, amd-gup, and intel-gpu variables. Various online resources mentioned that it's not allowed to have a dash in the name of an environment variable. For example see this post on stackoverflow.

So to work this around in bash, something like

env 'ELPA_DEFAULT_nvidia-gpu=1' ./test_elpa.x ...

is needed, instead of the more familar way, export ELPA_DEFAULT_nvidia-gpu=1.

It would be great if this could be clarified in the USERS_GUIDE, or even better, createnvidia_gpu, amd_gpu, and intel_gpu as aliases to the existing ones.

Dead link in README.md

https://elpa.mpcdf.mpg.de/elpa-tar-archive, mentioned at https://github.com/marekandreas/elpa/blob/master/README.md?plain=1#L66, is not working.

I think the new URL is https://elpa.mpcdf.mpg.de/software/tarball-archive/ELPA_TARBALL_ARCHIVE.html

pkgconfig file name with version issue

Currently elpa installation adds

lib/libelpa.so
lib/pkgconfig/elpa-2020.05.001.pc

files.

When I use pkg_search_module in CMake to look for elpa, it searches for elpa.pc but doesn't recognize elpa-VERSION.pc, then I got a chicken and egg problem. I can only search with the version but If I don't find elpa, how do I know the version.

I'd like to see how other people solves this issue.
Or can we just put elpa.pc wthout VERSION.

Configure erro on Ubuntu 22.04 with Intel icc classic and ifort classic from oneAPI 2022.3

Hello, Sir/Madam.
The code tells me configuration error about Fortran. they have been attached .
Could you help ?
Thanks.

Configure Erro.txt

ELPA available on conda-forge

ELPA is now also available via the conda package manager from the conda-forge channel (https://anaconda.org/conda-forge/elpa) for Linux (x86_64, ppc64le, aarch64) and OSX (x86_64) as serial, threaded and MPI parallel version (both MPICH and OpenMPI).

The feedstock repository is available here. If you are interested in co-maintaining ELPA on conda-forge let me know (maintaining a package on conda-forge is not much work as most tasks are automated by bots).

Latest ELPA-2021.11.001 complains about blacsgrid

I changed to the latest ELPA and now this error is reported:

ELPA_SETUP ERROR: your provided blacsgrid is not ok!
BLACS_GRIDINFO returned an error! Aborting...

What exactly is not ok and how I should fix the blacs grid (4x4 in my case)?

Potential MPI issue when using GPU kernel of ELPA on a subset of MPI ranks

Dear ELPA developers,

We have been successfully using CPU kernels of ELPA in DFT-FE code (https://github.com/dftfeDevelopers/dftfe), where I am one of the lead developers. We use ELPA on a subset of MPI ranks of the MPI_COMM_WORLD used by DFT-FE. In particular, we use MPI_Comm_create_group to create the new communicator which we pass to ELPA. Recently, I have been trying to use the GPU kernels of ELPA (using elpa-2020.11.001.rc1) on Summit supercomputer using the same route, but the code gets stuck after printing:

Initializing the GPU devices
when using subset of the ranks. It works fine if all ranks are used.

Upon investigating the ELPA source code, I found the issue to be the call to mpi_allreduce where MPI_COMM_WORLD is used.

elpa/src/GPU/check_for_gpu.F90

Line 101 in 5bff935

 call mpi_allreduce(numberOfDevices, maxNumberOfDevices, 1, MPI_INTEGER, MPI_MAX, MPI_COMM_WORLD, mpierr) 

#ifdef WITH_MPI
      call mpi_allreduce(numberOfDevices, maxNumberOfDevices, 1, MPI_INTEGER, MPI_MAX, MPI_COMM_WORLD, mpierr)

      if (maxNumberOfDevices .ne. numberOfDevices) then
        print *,"Different number of GPU devices on MPI tasks!"
        print *,"GPUs will NOT be used!"
        gpuAvailable = .false.
        return
      endif
#endif

Since this is a sanity check, I have temporarily bypassed the above issue by commenting the above lines of code. Now I am able to run the GPU-kernel successfully on subset of ranks.

I would be very grateful for any guidance in resolving this issue cleanly either in the way we interfacing with ELPA, or if it could be fixed in the ELPA source code itself.

Thank you,
Sambit

ELPA GPU kernels is not working on A100

We have built cp2k-9.1 for NVIDIA A100 and installed elpa-2021.11.001 via its toolchain.

we have the following error message when elpa is called from cp2k-9.1.

 Initializing the GPU devices

Found 8 GPUs
MPI rank 0 uses GPU #0
 ELPA: Warning, GPU usage has been requested but compute kernel is set by the us
 er as non-GPU!
 The compute kernel will be executed on CPUs!

I notice that this error comes from a conditional branch around L. 796 in src/elpa2/elpa2_template.F90.
It arises if both the following variables are TRUE: WITH_REAL_NVIDIA_SM80_GPU_KERNEL and GPU_KERNEL.
We have both the normal GPU kernel and the kernel for NVIDIA A100 in our executables built via the toolchain of cp2k-9.1, and this seems to be the source of the problem.

For our purpose, it would suffice if we can run the normal GPU kernel (instead of the one for A100).
Is is possible to stop building the new A100 GPU kernel via the configure options ?

The following is our current configure options.
Our system is Intel Xeon Platinum 8360Y (two sockets), equipped with eight A100 GPUs.
The compilers are intel oneAPI compilers (2021.2.0) and cuda 11.2.

../configure --libdir="${pkg_install_dir}/${TARGET}/lib" \
   --enable-openmp=yes \
   --enable-shared=no \
   --enable-static=yes \
   ${other_kernel_flags} \
   --enable-nvidia-gpu=yes \
   --with-cuda-path=${CUDA_PATH} \
   --with-NVIDIA-GPU-compute-capability=sm_80 \ 
   ${other_config_flags}

I appreciate your help on this issue. Thank you in advance.

Eigenvector Check: what do check on errmax=0 do?

During make check of elpa-2023.11.001 on my pc, all complex evp test failed; it seems the fail comes by a n if in test/shared/test_check_correctness_template.F90, line 501:

500        if (nev .ge. 2) then
501          if (errmax .gt. tol_res .or. errmax .eq. 0.0_rk) then
502            status = 1
503          endif
504        else
505          if (errmax .gt. tol_res) then
506            status = 1
507          endif
508        endif

The check errmax .eq. 0.0_rk confuses me. What do this check do, like, zero max-error would do harm in some calculation? Similar check also appears in other files like line 450, test/shared/test_analytic_template.F90, suggests it is set with purpose.

EDIT: These checks seems come from far old commits like b9bbba2, but with no more info. Maybe deleting this would cause bug?

inconsistent result among 1step\2step and scalapack

Hi, I have solve the matrix use 1 step\ 2 step and scalapack way, but get three different resulet. This is a 240*240 symmetry matrix, and compute all eigenvalue. All the test has passed. what could cause this ?

some smallest eigenvalue

scalapack

0.173720131800915       0.173878822327725       0.187152350855710     
  0.191423385819358       0.194181552429425       0.194181552844384     
  0.194449431187434       0.194449431797715       0.199507996641544

elpa 2step

-2.07585731052087       -2.03591585272923       -1.72390402574035     
  -1.66384803970337       -1.52011422574109      -1.43106927116121     
  -1.39056098837714       -1.29185171666780     -1.24300500320917

elpa 1 step

-3.14666685601262       -3.14303290459765      -3.13546185348343       
-3.00895984604710       -2.99530541682141      -2.98274897202263       
-2.90288228355549       -2.90085399507448      -2.87626101080164

procedure

blacs initial

      call blacs_pinfo(my_proc, nprocessors)
      call blacs_get(0, 0, icontext)
      call blacs_gridinit(icontext, 'C', nprow, npcol)
      call blacs_gridinfo(Icontext, nprow, npcol, myrow, mycol)
      ir = max(1, numroc(s%norbitals, nb, myrow, 0, nprow))
      ic = max(1, numroc(s%norbitals, nb, mycol, 0, npcol))
      call descinit(desc_x, s%norbitals, s%norbitals, nb, nb, 0, 0,        &
      &                 icontext, ir, info)

elpa set

  ICTXT = desca(CTXT_para)
  MB =desca(MB_para)
  NB = desca(NB_para)
  if(MB .ne. NB) stop " not support block size not equal of row and col"
  call blacs_gridinfo(ICTXT, NPROW, NPCOL, my_prow, my_pcol)
  NLROW = numroc(n, MB, my_prow, 0, NPROW) ! number of rows contained in mine
  NLCOL = numroc(n, MB, my_pcol, 0, NPCOL)
  bandwidth = MB
  if (elpa_init(CURRENT_API_VERSION) /= ELPA_OK) then
     print *, "ELPA API version not supported"
     stop 1
  endif
  e => elpa_allocate(error_elpa)
  call e%set("na", int(n,kind=c_int), error_elpa)
  call e%set("nev", int(nev,kind=c_int), error_elpa)
  call e%set("local_nrows", int(NLROW,kind=c_int), error_elpa)
  call e%set("local_ncols", int(NLCOL,kind=c_int), error_elpa)
  call e%set("nblk", int(MB,kind=c_int), error_elpa)
  call e%set("mpi_comm_parent", int(MPI_COMM_WORLD,kind=c_int), error_elpa)
  call e%set("process_row", int(my_prow,kind=c_int), error_elpa)
  call e%set("process_col", int(my_pcol,kind=c_int), error_elpa)
  call e%set("bandwidth", int(bandwidth,kind=c_int), error_elpa)
  call e%eigenvectors(a, ev, z, error_elpa) !use environment to decide which solver is used

ifort and icc

TOTAL: 311
PASS: 130
SKIP: 109
XFAIL: 0
FAIL: 72
XPASS: 0
ERROR: 0

See ./test-suite.log
Please report to [email protected]
make[3]: *** [Makefile:84658: test-suite.log] Error 1
make[3]: Leaving directory '/home/aarav/wien2k/elpa/build'
make[2]: *** [Makefile:84766: check-TESTS] Error 2
make[2]: Leaving directory '/home/aarav/wien2k/elpa/build'
make[1]: *** [Makefile:90928: check-am] Error 2
make[1]: Leaving directory '/home/aarav/wien2k/elpa/build'
make: *** [Makefile:90930: check] Error 2

ELPA latest release requires CPP variable defined

Here is the feedback from spack's pull-request spack/spack#33439

You need to add a requirement for AC_PROG_CPP in the autotools

Inconsistent checks for HAVE_SKEWSYMMETRIC in C headers

elpa_generated.h checks if HAVE_SKEWSYMMETRIC is defined, but the same check is not contained in elpa_generic.h, causing compilation issues. I have a quick and dirty fix in this commit dmejiar@7390e12, but I am not sure this is the best way to handle the issue.

elpa-2022.11.001 compilation error

Dear @marekandreas !
We observe a problem in compilation of recent elpa with CUDA:

test/C/test.c:171:10: fatal error: ../shared/GPU/test_gpu_vendor_agnostic_la
              yerVariables.h: No such file or directory
     13778      171 | #include "../shared/GPU/test_gpu_vendor_agnostic_layerVariables.h"
     13779          |          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
     13780    compilation terminated.
  >> 13781    make[1]: *** [Makefile:79038: test/C/validate_c_version_complex_double_eigenvalues_1stage_gpu_analytic_explicit-test.o] Error 1

We use spack to compile elpa. The spec is simply "elpa%gcc +cuda cuda_arch=80"

Inconsistency in deprecated options

Unless I'm missing something, --enable-gpu-streams=nvidia requires --enable-nvidia-gpu, which is however deprecated. Using the new variable --enable-nvidia-gpu-kernels results in

  >> 414    configure: error: If --enable-gpu-streams=nvidia is set, you must also use --enable-nvidia-gpu

PS: Is this a mirror of https://gitlab.mpcdf.mpg.de/elpa/elpa? I could not open an issue in the original repository.

deadlock when using H2O-RPA-32

Hi
I am getting a deadlock when running on 16 MPI processes the H2O-32-RI-dRPA-TZ.inp case

I start like this
MPI_PER_GPU=2 mpirun --bind-to none -n 16 binder.sh ../../../exe/local_cuda/cp2k.psmp -i H2O-32-RI-dRPA-TZ.inp ./exe/local_cuda/cp2k.psmp -i H2O-32-RI-dRPA-TZ.inp

Very quickly the program will hangs after
p coordinates 3 0.000 0.000 0.6
p buffer 3 0.000 0.000 0.6
p layout 3 0.000 0.000 0.2
p allocation 2 0.000 0.000 0.0
p init 2 0.000 0.000 0.1

from the gdb stack you can see that 16 are calling
gdb_1431574.out:#12 0x00007f0d96a1a74f in __elpa2_impl_MOD_elpa_solve_evp_real_2stage_double_impl () from /opt/elpa/lib/libelpa_openmp.so.15

You can see that 4 stacks go here
The first 4 go here
grep __elpa2_compute_MOD_bandred_real_double *.out
gdb_1431574.out:#11 0x00007f0d969f8e7b in __elpa2_compute_MOD_bandred_real_double ()
gdb_1431575.out:#14 0x00007fe14a904325 in __elpa2_compute_MOD_bandred_real_double ()
gdb_1431577.out:#10 0x00007f9bbc664325 in __elpa2_compute_MOD_bandred_real_double ()
gdb_1431581.out:#14 0x00007f2884ddd325 in
and go in a mpi reduction ..

the others call directly mod_check_for_gpu_MOD_check_for_gpu
while the others go here
#10 0x00007f40d09dcbbd in ompi_allreduce_f (sendbuf=0x7ffe6a2269d8 "\001",
recvbuf=0x7ffe6a2265ec "\001", count=0x7f40fb7a9d00,
datatype=, op=0x7f40fb7a9d00, comm=,
ierr=0x7ffe6a2265e8) at pallreduce_f.c:87
#11 0x00007f40fb724503 in __mod_check_for_gpu_MOD_check_for_gpu ()
from /opt/elpa/lib/libelpa_openmp.so.15
#12 0x00007f40fb7419f7 in __elpa2_impl_MOD_elpa_solve_evp_real_2stage_double_impl () from /opt/elpa/lib/libelpa_openmp.so.15
#13 0x00007f40fb6a06f7 in __elpa_impl_MOD_elpa_eigenvectors_d ()

To summarize ... 12 are already in the PMPI_Allreduce while 4 are still doing something else ..

I hope this may give you some guidance to solve this bug ...
Please do not hesitate to contact me directly at Gunter Roth [email protected]
It would be a complete pleasure to complete any missing information .. and thanks again for all your ELPA efforts ..
Gunter

Also attaching my summary file debug_H2O-32-RI-dRPA-TZ.txt ..

debug_H2O-32-RI-dRPA-TZ.txt

I can use ELPA when no_procs > matrix size

Hi,

Question

Is it possible to use ELPA for small matrices when the number of mpi processes n_procs is larger than the number of elements in the matrix?

Explanation of situation

For practical purposes I would sometimes like to use more processes than the size of my matrix.

Currently, we can use ELPA or ScaLAPACK to diagonalise all of the matrices in our code. The first one can be very small but the second is typically 10,000 x 10,000.

For example, if the first matrix is 2x2 and I use 8 or more processes then calculations fails using ELPA. But if I use ScaLAPACK it will run.

For practical reasons I cannot share the code but both ScaLAPACK and ELPA runs use the same grid setup using BLACS_Gridinit and descinit etc.

Is there a simple explanation to this?

The errors/warning I get for the 2x2 matrix with 8 procs are:

 ELPA: Warning, block size too large for this matrix size and process grid!
 Choose a smaller block size if possible.

All the success points are checked, including elpa%setup() and it fails when the success returned by elpa%eigenvectors is not equal to elpa_OK.

As for a 1x1 matrix (running with 2 procs) I get a different error:

 ELPA_SETUP ERROR: your provided blacsgrid is not ok! 
 BLACS_GRIDINFO returned an error! Aborting...

I know it may seem stupid to use ELPA for 1x1 matrix but it is more that the code is structured to use 1 diagonaliser for everything. Using ScaLAPACK we can do this but it appears ELPA has different criteria.

I can possibly implement a workaround to deal with small matrices but I'd prefer to make minimal changes.

Elpa version is: 2021.05.002
I use tried two intel compilers: 19.0.0.117 20180804 and 2021.4.0 20210910

Sorry if this is too vague. I guess my main question is not how do you specifically fix my problem but is it in principle possible to use ELPA for small matrices when the number of mpi processes n_procs is large.

C++ "multiple definition of" issue

Hi @marekandreas !
It is probably an issue with the header guards. To reproduce:
a.cpp:

#include <elpa/elpa.h>

int main()
{
    return 0;
}

b.cpp:

#include <elpa/elpa.h>

void foo()
{

}

gcc a.cpp b.cpp -I/path/to/elpa/include/elpa_openmp-2022.11.001.rc2/ -L/path/to/elpa/elpa/lib -lelpa_openmp leads to

/usr/bin/ld: /tmp/ccK00fbG.o: in function `elpa_set(elpa_struct*, char const*, int, int*)':
b.cpp:(.text+0x0): multiple definition of `elpa_set(elpa_struct*, char const*, int, int*)'; /tmp/cc51CaRG.o:a.cpp:(.text+0x0): first defined here
/usr/bin/ld: /tmp/ccK00fbG.o: in function `elpa_set(elpa_struct*, char const*, double, int*)':
b.cpp:(.text+0x35): multiple definition of `elpa_set(elpa_struct*, char const*, double, int*)'; /tmp/cc51CaRG.o:a.cpp:(.text+0x35): first defined here
/usr/bin/ld: /tmp/ccK00fbG.o: in function `elpa_get(elpa_struct*, char const*, int*, int*)':
b.cpp:(.text+0x75): multiple definition of `elpa_get(elpa_struct*, char const*, int*, int*)'; /tmp/cc51CaRG.o:a.cpp:(.text+0x75): first defined here
/usr/bin/ld: /tmp/ccK00fbG.o: in function `elpa_get(elpa_struct*, char const*, double*, int*)':
b.cpp:(.text+0xac): multiple definition of `elpa_get(elpa_struct*, char const*, double*, int*)'; /tmp/cc51CaRG.o
...

Deadlock in ELPA-gpu when making consecutive calls with different comm sizes (due to check_for_gpu()'s MPI_Allreduce)

Hi,

I have observed that CP2K deadlocks with certain rank counts when running with the ELPA backend. I don't know ELPA very well, but with the debugger I think I could gather enough info to pinpoint the problem.

This is what happens in the run, I run with 8 MPI ranks.

CP2K calls ELPA solver with a comm of size 4.
ELPA initializes the GPUs for the 4 ranks in check_for_gpu() #442.
ELPA solver succeeds, execution continues normally.
Later, CP2K calls ELPA solver again, but with a comm of size 8
Now there is a deadlock at check_for_gpu() #507. As the first 4 ranks were already initialized, they have exited the function earlier #453, and do not reach the Allreduce, so the last 4 ranks will hang there forever.

I don't know whether calling ELPA with different comm sizes is allowed or not? But my first thought would be that check_for_gpu() should first query the value of all rank's gpuIsInitialized and restart the initialization for everyone if any of the ranks was not initialized, this seems to fix the deadlock in my case.

How to run c test?

Hi,
I know it may seem as a novice question but I am having difficulties running the C tests.
If I am not mistaken I need to compile the tests inside the /test/C folder right? how do I compile them?
I have installed elpa using https://xconfigure.readthedocs.io/en/latest/elpa/

I would really love the help. thanks in advance!

Compilation error: "cannot find -ludev"

Hello,

I am trying to compile elpa-2021.11.002 on my HPC cluster. I have the following modules loaded:

intel/21.2.0
intel-mkl/2021.2.0
openmpi/4.1.1
hwloc/1.11.8

Running make, the program compiles for a while, and eventually stops with this error:

GEN      libelpa.la
ld: cannot find -ludev
make[1]: *** [libelpa.la] Error 1
make[1]: Leaving directory `/net/fs2k02/srv/export/kaxiras/share_root/dbennett/elpa-2021.11.002/build'
make: *** [all] Error 2

I tried configuring using the instructions here, here. Also, I tried just simply running configure, specifying the compilers and linking to scalapack following the instructions in INSTALL.md:

FC=mpifort CC=mpicc ../configure \
SCALAPACK_LDFLAGS="-L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -Wl,-rpath,$MKL_HOME/lib/intel64" \
SCALAPACK_FCFLAGS="-L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64
-lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64
-lpthread -lm -I$MKL_HOME/include/intel64/lp64"

But both give the same error. I couldn't find any reference to libudev in the code, and not really sure why it's being called.

Any advice much appreciated

Log files and configure script attached here

config.txt
configure-elpa-skx.txt
make.txt

Thanks,

Daniel Bennett

Error in executing copy_double_complex_a_tmatc_kernel

Dear @marekandreas !
I'm running latest release of ELPA on GPUs and I'm getting this error message

Error in executing copy_double_complex_a_tmatc_kernel: invalid configuration argument

However it seems to be harmless as the calculation runs fine and all the data seems to be properly copied. This happens on 4x4 and 2x2 BLACS grids and different matrix sizes.

With kind regards,
Anton.

Compile error when with AMD GPU and MPI: uncomplete fortran_constants.F90 file and "Symbol 'elpa_2stage_complex_nvidia_gpu' has no IMPLICIT type"

Hi, I compile ELPA with AMD GPU and MPI but meets following two error, how to resolve this? Thank you!

 PPFC     src/helpers/libelpa_private_la-mod_precision.lo
  PPFC     src/helpers/libelpa_private_la-mod_omp.lo
  PPFC     src/GPU/CUDA/libelpa_private_la-mod_cuda.lo
  PPFC     src/GPU/ROCm/libelpa_private_la-mod_hip.lo
  PPFC     src/libelpa_private_la-elpa_generated_fortran_interfaces.lo
  PPFC     src/libelpa_public_la-elpa_constants.lo
./src/fortran_constants.F90:2:35:

    2 |  integer(kind=C_INT), parameter ::
      |                                   1
Error: Invalid character in name at (1)
./src/fortran_constants.F90:4:35:

... and more. All fortran_constants.F90 file like this.

and

../src/elpa_constants.F90:57:99:

   57 |   integer(kind=C_INT), parameter           :: ELPA_2STAGE_REAL_GPU    = ELPA_2STAGE_REAL_NVIDIA_GPU
      |                                                                                                   1
Error: Symbol 'elpa_2stage_real_nvidia_gpu' at (1) has no IMPLICIT type; did you mean 'elpa_2stage_real_gpu'?
../src/elpa_constants.F90:58:102:

   58 |   integer(kind=C_INT), parameter           :: ELPA_2STAGE_COMPLEX_GPU = ELPA_2STAGE_COMPLEX_NVIDIA_GPU
      |                                                                                                      1
Error: Symbol 'elpa_2stage_complex_nvidia_gpu' at (1) has no IMPLICIT type; did you mean 'elpa_2stage_complex_gpu'?
make[1]: *** [Makefile:75736: src/libelpa_public_la-elpa_constants.lo] Error 1
make[1]: Leaving directory '/work1/jrf/tool/elpa/build-mpi-gpu'

My configure is gcc (GCC) 11.2.1/ rocm4.3.0/intel mpi 2021

 FC=mpif90  CC=mpicc CXX=hipcc   ../configure \
 CPP="gcc -E" \
 FCFLAGS="-g -O3  " \
 CXXFLAGS=" -O3 -DROCBLAS_V3 -D__HIP_PLATFORM_AMD__ --offload-arch=gfx90a -g -O3 -std=c++17 "  \
 CFLAGS="-O3  -g -O3 -std=c++17  " \
 --disable-mpi-module \
 --enable-option-checking=fatal \
 LIBS="-L$ROC_HOME/lib -Wl,-rpath=$ROC_HOME/lib -L$ROC_HOME/hip/lib -Wl,-rpath=$ROC_HOME/hip/lib -lamdhip64 -fPIC -lrocblas" \
 --with-mpi=yes --disable-sse --disable-sse-assembly --disable-avx --disable-avx2 --disable-avx512 \
 --enable-amd-gpu --enable-single-precision --enable-gpu-streams=amd --enable-hipcub --disable-cpp-tests --with-rocsolver \
 SCALAPACK_LDFLAGS=" -L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread " \
 SCALAPACK_FCFLAGS="-I$MKL_HOME/include  -I$MKL_HOME/include/intel64/lp64 -L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread "

Memory leak with CUDA in 2022.11.001.rc2

The latest 2022.11.001.rc2 release with --enable-nvidia-gpu=yes has a memory leak:

=================================================================
==1088853==ERROR: LeakSanitizer: detected memory leaks

Direct leak of 8 byte(s) in 1 object(s) allocated from:
    #0 0x7fb6bfa2c302 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75
    #1 0x562628e6b2e1 in cublasCreateFromC (/home/ole/git/cp2k/exe/local_cuda/cp2k.pdbg+0x3dff2e1)
    #2 0x562628e6d0e7 in __mod_check_for_gpu_MOD_check_for_gpu ../src/GPU/./handle_creation_template.F90:25
    #3 0x562628eb7ac1 in __elpa2_impl_MOD_elpa_solve_evp_real_2stage_a_h_a_double_impl ../src/elpa2/elpa2_template.F90:422
    #4 0x562628e2995a in __elpa_impl_MOD_elpa_eigenvectors_a_h_a_d ../src/elpa_impl_math_solvers_template.F90:126
    #5 0x562627c75aef in cp_fm_diag_elpa_base /home/ole/git/cp2k/src/fm/cp_fm_elpa.F:537
...

Is cholesky decomposition with gpusolver not implemented yet?

Issue

I compiled ELPA-2023.05.001 on my computer with a AMD MI50, and found the program fails all tests about cholesky decomposition on gpu with segmentation fault. Along the backtrace, it seems that the problem is caused by:

src/elpa_impl_generalized_transform_template.F90

This templete calls all-host-array routine without checking if the work is on GPU.

117     if (.not. is_already_decomposed) then
118       ! B = U^T*U, B<-U
119       call self%elpa_cholesky_a_h_a_&
120           &ELPA_IMPL_SUFFIX&
121           &(b, error)

src/cholesky/elpa_cholesky_template.F90

This templete checks work type by call obj%get("gpu_cholesky",gpu_cholesky, error) instead of checking macros about gpu, which made the a_h_a call from above tries to use GPU, then causing segmentation fault.

157  call obj%get("gpu_cholesky",gpu_cholesky, error)
158  if (error .ne. ELPA_OK) then
159    write(error_unit,*) "ELPA_CHOLESKY: Problem getting option for gpu_cholesky. Aborting..."
160    success = .false.
161    return
162  endif
163
164  if (gpu_cholesky .eq. 1) then
165    useGPU = (gpu == 1)
166  else
167    useGPU = .false.
168  endif

Would it be OK to simply add gpu-using check in either file, or there are some further work on gpusolver in progress? Options about gpusolver do not appear on INSTALL.md but do show in configure --help, is it useable for other routines(e.g. eigenvector) now?

Detailed output

All outputs and call stack in the log file.
validate_c_version_complex_double_generalized_1stage_gpu_random_default.sh.log

Steps to reproduce the issue

Source the script below, make, and make check.

Configure-amdclang.txt

Using ROCmCC instead of hipcc to compile kernels on cpu written in C. It seems CXX could be simply replaced with hipcc to avoid editting Makefile, but not tested yet.

Check for MPI threading support incorrect

In file elpa_impl.F90 the check for MPI thread level support looks at the moment

      if ((providedMPI .ne. MPI_THREAD_SERIALIZED) .and. (providedMPI .ne. MPI_THREAD_MULTIPLE)) then
#if defined(ALLOW_THREAD_LIMITING)
        write(error_unit,*) "WARNING elpa_setup: MPI threading level MPI_THREAD_SERALIZED or MPI_THREAD_MULTIPLE required but &
...

I think the check should be

if ((providedMPI .ne. MPI_THREAD_SERIALIZED) .or. (providedMPI .ne. MPI_THREAD_MULTIPLE)) then

or, as the levels are ordered

if (providedMPI .lt. MPI_THREAD_SERIALIZED) then

	#ifdef COMPLEXCASE
	#ifdef DOUBLE_PRECISION_COMPLEX
	& !bind(C, name="elpa_solve_tridiagonal_dc")
	#endif
	#ifdef SINGLE_PRECISION_COMPLEX
	& !bind(C, name="elpa_solve_tridiagonal_fc")
	#endif
	#endif