marekandreas / elpa Goto Github PK
View Code? Open in Web Editor NEWA scalable eigensolver for dense, symmetric (hermitian) matrices (fork of https://gitlab.mpcdf.mpg.de/elpa/elpa.git)
License: Other
A scalable eigensolver for dense, symmetric (hermitian) matrices (fork of https://gitlab.mpcdf.mpg.de/elpa/elpa.git)
License: Other
I am trying to install ELPA 2021.05.001 using GNU C and Fortran compilers on a server with Intel Xeon Gold CPUs and NVIDIA V100 GPUs. For some reason the configure script is unable to find the library containing cublasDgemm even though it is able to detect "nvcc" as you can see in the screenshot attached. All the CUDA paths are clearly specified as you can see in the attached config.log
Unable to understand what am I missing here!
config.log
checking whether we can compile AVX512 gcc intrinsics in C... no
configure: error: Could not compile a test program with AVX512, adjust the C compiler or CFLAGS. Possibly (some of) the flags " -mmmx -msse -msse2 -mssse3 -msse4.1 -msse4.2 " solve this issue
===> Script "configure" failed unexpectedly.
config.log:
configure:13737: checking whether we can compile AVX512 gcc intrinsics in C
configure:13751: cc -c -fopenmp -O2 -pipe -fno-omit-frame-pointer -mmmx -msse -msse2 -mssse3 -msse4.1 -msse4.2 -mavx -msse3 -fstack-protector-strong -isystem /usr/local/include -fno-strict-aliasing -fno-omit-frame-pointer -isystem /usr/local/include conftest.c >&5
conftest.c:47:17: error: always_inline function '_mm512_load_pd' requires target feature 'avx512f', but would be inlined into function 'main' that is compiled without support for 'avx512f'
__m512d q1 = _mm512_load_pd(q);
^
conftest.c:47:17: error: AVX vector return of type '__m512d' (vector of 8 'double' values) without 'avx512f' enabled changes the ABI
conftest.c:48:17: error: always_inline function '_mm512_fmadd_pd' requires target feature 'avx512f', but would be inlined into function 'main' that is compiled without support for 'avx512f'
__m512d y1 = _mm512_fmadd_pd(q1, q1, q1);
^
conftest.c:48:17: error: AVX vector argument of type '__m512d' (vector of 8 'double' values) without 'avx512f' enabled changes the ABI
4 errors generated.
configure:13751: $? = 1
configure: failed program was:
| /* confdefs.h */
| #define PACKAGE_NAME "elpa"
[...]
configure:13758: result: no
configure:13761: error: Could not compile a test program with AVX512, adjust the C compiler or CFLAGS. Possibly (some of) the flags " -mmmx -msse -msse2 -mssse3 -msse4.1 -msse4.2 " solve this issue
OS: FreeBSD 13
2021.05.002_bugfix
breaks with this log.
Configure environment:
F77="gfortran10" FC="gfortran10" FFLAGS="-O -Wl,-rpath=/usr/local/lib/gcc10" FCFLAGS="-std=legacy -I/usr/local/include -I/disk-samsung/freebsd-ports/math/elpa/work/elpa-2021.05.002_bugfix -Isrc -Isrc/general -Wl,-rpath=/usr/local/lib/gcc10" MAKE=gmake ac_cv_path_PERL=/usr/local/bin/perl ac_cv_path_PERL_PATH=/usr/local/bin/perl PERL_USE_UNSAFE_INC=1 XDG_DATA_HOME=/disk-samsung/freebsd-ports/math/elpa/work XDG_CONFIG_HOME=/disk-samsung/freebsd-ports/math/elpa/work XDG_CACHE_HOME=/disk-samsung/freebsd-ports/math/elpa/work/.cache HOME=/disk-samsung/freebsd-ports/math/elpa/work PATH=/disk-samsung/freebsd-ports/math/elpa/work/.bin:/home/yuri/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/local/sbin:/usr/local/bin SHELL=/bin/sh CONFIG_SHELL=/bin/sh ADDR2LINE="/usr/local/bin/addr2line" AR="/usr/local/bin/ar" AS="/usr/local/bin/as" CPPFILT="/usr/local/bin/c++filt" GPROF="/usr/local/bin/gprof" LD="/usr/local/bin/ld" NM="/usr/local/bin/nm" OBJCOPY="/usr/local/bin/objcopy" OBJDUMP="/usr/local/bin/objdump" RANLIB="/usr/local/bin/ranlib" READELF="/usr/local/bin/readelf" SIZE="/usr/local/bin/size" STRINGS="/usr/local/bin/strings" CMAKE_PREFIX_PATH="/usr/local" CONFIG_SITE=/disk-samsung/freebsd-ports/Templates/config.site lt_cv_sys_max_cmd_len=524288
Configure arguments:
--disable-avx --disable-avx2 --disable-avx512 --disable-static --disable-c-tests --without-mpi --disable-openmp --disable-sse --disable-sse-assembly --prefix=/usr/local ${_LATE_CONFIGURE_ARGS}
gcc-10
FreeBSD 13
Hi, the document says "multi-GPU runs on mutliple nodes have not been tested", does elpa is still the situation now? We want to use elpa on an AMD GPU cluster, can elpa do it now?
Due to -Werror we are getting these errors:
cpp: error: cpp: error: argument unused during compilation: '-P' [-Werror,-Wunused-command-line-argument]
Log: http://beefy18.nyi.freebsd.org/data/main-amd64-default/pd83453f82622_sd5c1296234/logs/elpa-2019.05.002_2.log (IPv6 URL)
Hello, I would like to report three different issues found when compiling ELPA ( 2022.11.001 ) in the LUMI supercomputer.
I have successfully managed to compile and run with the libraries (both with and without AMD gpu support) after addressing those appropriately.
elpa/src/elpa_impl_math_template.F90
Lines 889 to 896 in bea1f0f
According to GNU standards, there cannot be a comment-only line within a line continuation. While I think that Cray is being very pedantic about it, changing those lines to something like the following fixes the issue:
subroutine elpa_solve_tridiagonal_&
&ELPA_IMPL_SUFFIX&
#ifdef REALCASE
#ifdef DOUBLE_PRECISION_REAL
&_c(handle, d_p, e_p, q_p, error) bind(C, name="elpa_solve_tridiagonal_d")
#endif
#ifdef SINGLE_PRECISION_REAL
&_c(handle, d_p, e_p, q_p, error) bind(C, name="elpa_solve_tridiagonal_f")
#endif
#endif
#ifdef COMPLEXCASE
#ifdef DOUBLE_PRECISION_COMPLEX
&_c(handle, d_p, e_p, q_p, error) !bind(C, name="elpa_solve_tridiagonal_dc")
#endif
#ifdef SINGLE_PRECISION_COMPLEX
&_c(handle, d_p, e_p, q_p, error) !bind(C, name="elpa_solve_tridiagonal_fc")
#endif
#endif
CPP tests fail to compile with a “source file is not valid UTF-8 fatal error”
It seems as if the compiler was trying to re-compile from the object files instead of the source *.cpp files.
C tests fail with this same error in all of the validate_* files, probably due to a linking issue. :
ld.lld: error: ./.libs/libelpa.so: undefined reference to .omp_offloading.img_size.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined]
ld.lld: error: ./.libs/libelpa.so: undefined reference to .omp_offloading.img_cache.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined]
ld.lld: error: ./.libs/libelpa.so: undefined reference to .omp_offloading.img_start.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined]
ld.lld: error: ./.libs/libelpa.so: undefined reference to .omp_offloading.img_size.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined]
ld.lld: error: ./.libs/libelpa.so: undefined reference to .omp_offloading.img_cache.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined]
ld.lld: error: ./.libs/libelpa.so: undefined reference to .omp_offloading.img_start.cray_amdgcn-amd-amdhsa [--no-allow-shlib-undefined]
postdeps="-lc -lcsup -lgcc_eh -lm -lclang_rt.craypgo-x86_64 -lgcc -lclang_rt.builtins-x86_64
-L -L
"
This results in a compiler crash saying that "-L" cannot be empty. I could not backtrace the whole thing, but apparently something happens when configure is parsing the environment.
My solution for now is to remove those extra "-L" from libtool after the configure step is done.
If I can provide any additional information on this, please do let me know!
log file attach
config.log
I downloaded elpa-2022.11.001.rc1 and tried the following:
cd elpa
mkdir build
cd build
../configure FC=ftn CC=cc CXX=CC --enable-nvidia-gpu --with-cuda-path=$CUDA_HOME --with-NVIDIA-GPU-compute-capability=sm_80 --disable-sse-assembly --disable-sse
make -j4
Ran into this error:
../src/elpa_api.F90:1117:2: fatal error: ./GPU/handle_destruction_template.F90: No such file or directory
Could it be that somebody forgot to commit the file GPU/handle_destruction_template.F90
? I guess this also indicates a problem of the CI.
It is acknowledged that --enable-gpu=yes
is legacy (given that other GPUs are starting to gain support in ELPA). However, it would be nice to carry-forward the previous set of PARAMETER values (or enumeration values) if a legacy configuration is still allowed. For Fortran code, the module can be used or the header file (constants) can be included. For the latter, one can check if for instance ELPA_2STAGE_REAL_GPU
works (#if defined(ELPA_2STAGE_REAL_GPU)
), but this is not straight-forward or possible with with a pure module file approach.
This report is also about missing for instance ELPA_2STAGE_REAL_GPU
if --enable-gpu=no
is given at configuration time. With the 2020-release of ELPA, this apparently worked, i.e., ELPA_2STAGE_REAL_GPU
was part of the Fortran module. I wonder how this is ever valid for cross-compiled module files. Should support for an ELPA module file be dropped or should a request for ELPA_2STAGE_REAL_GPU
be redirected to something else (fallback)?
This is an ask to keep the ELPA interface as stable as possible or to change it less frequently, or to (alternatively/finally) remove exposing internals of ELPA as part of the installed bits ("interface"). If for instance elpa_constants.h
are exposed, they should be rather stable or not exist at all.
This issue can be of interest for the (upcoming) CP2K 8.2 (@mkrack @oschuett @dev-zero @alazzaro).
config.log
Trying to build elpa-2021.05.002 for my personal machine(Ubuntu 22.04 LTS) with Intel CORE(R) i7-9750H and NVIDIA GeForce GTX 1650.
CUDA has been installed in /usr/local/cuda and nvcc --version works perfectly. /usr/local/cuda/lib64/libcublas.so also exists. But configure error can't link to cublas in the file. Please see attached config.log
Edit: attached screenshot
Thanks in advance
I get a compiler error in file elpa_index.c using gcc 7.3
#ifdef WITH_AMD_GPU_VERSION
#define default_max_stored_rows 256
#else
int const default_max_stored_rows = 256;
#endif
Old compiler versions don't treat this kind of definitions as constants and you can't use these variables to initialize structures at compile time. Using a "#define" instead of a " int const" will work for any compiler.
Hi, can ELPA be compiled with AMD GPU and MPI? I only find the way that compile mpi with CUDA in the INSTALL.md.
I am one of the CP2K developers and I am currently attempting to enforce block sizes of powers of 2 whenever we employ ELPA to solve eigenvalue problems (see cp2k/cp2k#2407). The code works fine on CPU but not on GPU, where ELPA occasionally throws
ELPA2: bandred returned an error. Aborting...
Problem getting option for debug settings. Aborting...
It happens with different kinds of tests repeatedly. We suppose that ELPA was never run on GPU in these cases.
Hi, I compile elpa with intel mpi, intel fortran(2021) and gcc\g++(11.2), and meet following error
make[1]: Entering directory '/work1/jrf/tool/elpa/build-mpi'
PPFC src/libelpa_public_la-elpa_constants.lo
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(67): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(69): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(71): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(73): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(75): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(77): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(79): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(81): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(83): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(85): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(87): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(89): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(91): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(95): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(99): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(102): error #5276: Unbalanced parentheses
)
^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(102): error #5082: Syntax error, found ')' when expecting one of: <LABEL> <END-OF-STATEMENT> ; <IDENTIFIER> TYPE MODULE ELEMENTAL IMPURE NON_RECURSIVE ...
)
^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(105): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(107): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(109): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(111): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(113): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(115): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(117): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(119): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(121): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(123): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(125): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(127): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
manually_preprocessed_.._src_elpa_constants.F90-src_.libs_libelpa_public_la-elpa_constants.o.F90(129): error #5082: Syntax error, found END-OF-STATEMENT when expecting one of: %FILL <IDENTIFIER>
integer(kind=C_INT), parameter ::
-----------------------------------^
/tmp/ifort9zzC8J.i90(133): catastrophic error: Too many errors, exiting
what could cause this ?
compile command
FC=mpiifort CC=mpicc CXX=mpicxx ../configure \
FCFLAGS="-O3 -march=core-avx2 " \
CFLAGS=" -O3 -mavx2 -mfma -funsafe-loop-optimizations -funsafe-math-optimizations -ftree-vect-loop-version -ftree-vectorize" CXXFLAGS="-O3 -mfma -funsafe-loop-optimizations -funsafe-math-optimizations -ftree-vect-loop-version -ftree-vectorize -mavx2 " \
--enable-option-checking=fatal \
SCALAPACK_LDFLAGS=" -L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread " \
SCALAPACK_FCFLAGS="-I$MKL_HOME/include -I$MKL_HOME/include/intel64/lp64 -L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread " \
--enable-avx2 --disable-avx512-kernels
According to src/elpa1/elpa1_template.F90, line 386, gpuBLAS with redistribute matrix is not implemented; trying to use so would result in memcopy issue and fall in segmentation fault. Is it possible checking configure option in release to disable that at configure stage instead of during testing?
I have been successful in using ELPA2 on NVIDIA GPU for the type "double". However the same test code fails for other data types such as "float" with an error
Assertion `error_elpa==ELPA_OK' failed
after the call to
elpa_eigenvectors_float(handle, a, ev, z, &error_elpa);
My understanding is that the only difference between the "double" version and the "float" version is the datatype of a, ev, and z, which should match the function call. But maybe I am missing something else?
I tried to use data (a, ev, z) on CPU and GPUs, with the same result.
The following error is displayed when trying spack install elpa %[email protected] ^[email protected] ^intel-mkl threads=openmp
1 error found in build log:
220 configure: WARNING: * allow ELPA at runtime to change the number of threads to 1 by setting "--enable-runtime-threading-support-checks
221 configure: WARNING: --enable-allow-thread-limiting --without-threading-support-check-during-build": this will ensure correct results, but
222 configure: WARNING: maybe not the best performance (depends on the threading of your blas/lapack libraries), see the USER_GUIDE
223 configure: WARNING: * switch of the checking of threading support "--disable-runtime-threading-support-checks
224 configure: WARNING: --without-threading-support-check-during-build: DO THIS AT YOUR OWN RISK! This will be fast, but might
225 configure: WARNING: (depending on your MPI library sometimes) lead to wrong results
>> 226 configure: error: You do have to take an action of the choices above!
With --enable-gpu-streams=amd
configure flag, validate_complex_double_eigenvectors_2stage_default_kernel_gpu_analytic_default.sh --- or say, all 2 stage gpu kernel --- fails with "Invalid DeviceId less than 0" from hip runtime.
backtrace is like:
:0:/tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_fatbin.hpp:50 : 614193288254 us: [pid:509108 tid:0x155546b83cc0] Invalid DeviceId less than 0
Thread 1 "abacus" received signal SIGABRT, Aborted.
__pthread_kill_implementation (no_tid=0, signo=6, threadid=23456002882752) at ./nptl/pthread_kill.c:44
44 ./nptl/pthread_kill.c: 没有那个文件或目录.
(gdb) bt
#0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=23456002882752) at ./nptl/pthread_kill.c:44
#1 __pthread_kill_internal (signo=6, threadid=23456002882752) at ./nptl/pthread_kill.c:78
#2 __GI___pthread_kill (threadid=23456002882752, signo=signo@entry=6) at ./nptl/pthread_kill.c:89
#3 0x0000155551842476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26
#4 0x00001555518287f3 in __GI_abort () at ./stdlib/abort.c:79
#5 0x00001555033018e6 in hip::FatBinaryInfo::DeviceIdCheck (device_id=device_id@entry=-1, this=<optimized out>)
at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_fatbin.hpp:50
#6 hip::FatBinaryInfo::BuildProgram (this=<optimized out>, device_id=device_id@entry=-1)
at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_fatbin.cpp:335
#7 0x000015550330576e in hip::Function::getStatFunc (this=0x5555564bffc0, hfunc=hfunc@entry=0x7fffffff7b18, deviceId=deviceId@entry=-1)
at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_global.cpp:132
#8 0x00001555032be473 in hip::StatCO::getStatFunc (this=0x555555e25cd0, hfunc=hfunc@entry=0x7fffffff7b18,
hostFunction=hostFunction@entry=0x155555435378 <my_unpack_c_hip_kernel_complex_double(int, int, int, int, int, HIP_vector_type<double, 2u>*, HIP_vector_type<double, 2u>*, int)>,
deviceId=deviceId@entry=-1) at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_code_object.cpp:848
#9 0x000015550344b54c in PlatformState::getStatFunc (this=<optimized out>, hfunc=hfunc@entry=0x7fffffff7b18,
hostFunction=hostFunction@entry=0x155555435378 <my_unpack_c_hip_kernel_complex_double(int, int, int, int, int, HIP_vector_type<double, 2u>*, HIP_vector_type<double, 2u>*, int)>,
deviceId=deviceId@entry=-1) at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_platform.cpp:858
#10 0x000015550344b5a8 in ihipLaunchKernel (hostFunction=0x155555435378 <my_unpack_c_hip_kernel_complex_double(int, int, int, int, int, HIP_vector_type<double, 2u>*, HIP_vector_type<double, 2u>*, int)>,
gridDim=..., blockDim=..., args=0x7fffffff8080, sharedMemBytes=0, stream=0x1555036aa180 <vtable for hip::Stream+16>, startEvent=0x0, stopEvent=0x0, flags=0)
at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_platform.cpp:568
#11 0x0000155503421cb2 in hipLaunchKernel_common (hostFunction=<optimized out>,
hostFunction@entry=0x155555435378 <my_unpack_c_hip_kernel_complex_double(int, int, int, int, int, HIP_vector_type<double, 2u>*, HIP_vector_type<double, 2u>*, int)>, gridDim=..., blockDim=...,
args=<optimized out>, args@entry=0x7fffffff8080, sharedMemBytes=<optimized out>, stream=<optimized out>)
at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_module.cpp:672
#12 0x000015550342c333 in hipLaunchKernel (hostFunction=0x155555435378 <my_unpack_c_hip_kernel_complex_double(int, int, int, int, int, HIP_vector_type<double, 2u>*, HIP_vector_type<double, 2u>*, int)>,
gridDim=..., blockDim=..., args=<optimized out>, sharedMemBytes=<optimized out>, stream=<optimized out>)
at /tmp/yizeyi18/spack-stage/spack-stage-hip-5.7.0-qjrvxutegybi3rhpu4mctpd2temd7krn/spack-src/clr/hipamd/src/hip_module.cpp:679
#13 0x00001555552391c9 in launch_my_unpack_c_hip_kernel_complex_double (row_count=<optimized out>, n_offset=<optimized out>, max_idx=<optimized out>, stripe_width=<optimized out>, a_dim2=<optimized out>,
stripe_count=<optimized out>, l_nev=<optimized out>, row_group_dev=<optimized out>, a_dev=<optimized out>, my_stream=<optimized out>) at ../src/GPU/ROCm/hipUtils_template.cpp:356
#14 0x000015555514838c in gpu_c_kernel::launch_my_unpack_gpu_kernel_complex_double (row_count=<error reading variable: Cannot access memory at address 0x4d5f3241444d5f32>, n_offset=0, max_idx=615,
stripe_width=1024, a_dim2=2880, stripe_count=<error reading variable: Cannot access memory at address 0x5756524c5f314c44>, l_nev=<optimized out>, row_group_dev=<optimized out>, a_dev=<optimized out>,
my_stream=<optimized out>) at ../src/elpa2/GPU/interface_c_gpu_kernel.F90:376
#15 0x0000155555149534 in pack_unpack_gpu::unpack_row_group_complex_gpu_double (obj=..., row_group_dev=23447635755008, a_dev=23437523288064, stripe_count=1, stripe_width=1024, last_stripe_width=615,
a_dim2=2880, l_nev=615, rows=<error reading variable: value requires 629760 bytes, which is more than max-value-size>, n_offset=0, row_count=64, wantdebug=.FALSE., allcomputeongpu=.FALSE.,
my_stream=93825182226512) at ../src/elpa2/pack_unpack_gpu.F90:362
#16 0x000015555514a84d in pack_unpack_gpu::unpack_and_prepare_row_group_complex_gpu_double (obj=..., row_group=<error reading variable: value requires 629760 bytes, which is more than max-value-size>,
row_group_dev=23447635755008, a_dev=23437523288064, stripe_count=1, stripe_width=1024, last_stripe_width=615, a_dim2=2880, l_nev=615, row_group_size=64, nblk=64, unpack_idx=64, next_unpack_idx=65,
force=.FALSE., wantdebug=.FALSE., allcomputeongpu=.FALSE., my_stream=93825182226512) at ../src/elpa2/pack_unpack_gpu.F90:429
#17 0x00001555550e5c21 in elpa2_compute::trans_ev_tridi_to_band_complex_double (obj=..., na=2816, nev=615, nblk=64, nbw=64, q=..., ldq=2816, matrixcols=2816,
hh_trans=<error reading variable: value requires 64880640 bytes, which is more than max-value-size>, my_pe=0, mpi_comm_rows=14, mpi_comm_cols=15, wantdebug=.FALSE., usegpu=.TRUE., max_threads_in=1,
success=.TRUE., kernel=23) at ../src/elpa2/elpa2_trans_ev_tridi_to_band_template.F90:1140
#18 0x000015555519af5e in elpa2_impl::elpa_solve_evp_complex_2stage_a_h_a_double_impl (obj=..., aextern=..., evextern=..., qextern=...) at ../src/elpa2/elpa2_template.F90:1403
#19 0x000015555504f64d in elpa_impl::elpa_eigenvectors_a_h_a_dc (self=..., a=..., ev=..., q=..., error=32767) at ../src/elpa_impl_math_solvers_template.F90:126
#20 0x0000155555058d0b in elpa_impl::elpa_eigenvectors_a_h_a_dc_c (handle=<optimized out>, a_p=<optimized out>, ev_p=<optimized out>, q_p=<optimized out>, error=32767)
at ../src/elpa_impl_math_solvers_template.F90:333
#21 0x0000555555aed081 in ELPA_Solver::eigenvector(std::complex<double>*, double*, std::complex<double>*) ()
#22 0x0000555555aee25f in ELPA_Solver::generalized_eigenvector(std::complex<double>*, std::complex<double>*, int&, double*, std::complex<double>*) ()
#23 0x0000555555817aac in hsolver::DiagoElpa<std::complex<double> >::diag(hamilt::Hamilt<std::complex<double>, psi::DEVICE_CPU>*, psi::Psi<std::complex<double>, psi::DEVICE_CPU>&, double*) ()
#24 0x000055555580fd51 in hsolver::HSolverLCAO<std::complex<double>, psi::DEVICE_CPU>::hamiltSolvePsiK(hamilt::Hamilt<std::complex<double>, psi::DEVICE_CPU>*, psi::Psi<std::complex<double>, psi::DEVICE_CPU>&, double*) ()
--Type <RET> for more, q to quit, c to continue without paging--
#25 0x000055555581174c in hsolver::HSolverLCAO<std::complex<double>, psi::DEVICE_CPU>::solveTemplate(hamilt::Hamilt<std::complex<double>, psi::DEVICE_CPU>*, psi::Psi<std::complex<double>, psi::DEVICE_CPU>&, elecstate::ElecState*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) ()
#26 0x0000555555811f85 in hsolver::HSolverLCAO<std::complex<double>, psi::DEVICE_CPU>::solve(hamilt::Hamilt<std::complex<double>, psi::DEVICE_CPU>*, psi::Psi<std::complex<double>, psi::DEVICE_CPU>&, elecstate::ElecState*, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool) ()
#27 0x000055555592741c in ModuleESolver::ESolver_KS_LCAO<std::complex<double>, std::complex<double> >::hamilt2density(int, int, double) ()
#28 0x00005555558e4e23 in ModuleESolver::ESolver_KS<std::complex<double>, psi::DEVICE_CPU>::Run(int, UnitCell&) ()
#29 0x00005555557bf514 in Relax_Driver<double, psi::DEVICE_CPU>::relax_driver(ModuleESolver::ESolver*) ()
#30 0x00005555557d1c99 in Driver::driver_run() ()
#31 0x00005555557d0ee5 in Driver::atomic_world() ()
#32 0x00005555557d1702 in Driver::init() ()
#33 0x00005555555ad1e4 in main ()
Hi,
I found this issue when I was building ELPA on AMD zen4 architecture with AVX512. The build broke with multiple errors like:
src/elpa2/kernels/complex_128bit_256bit_512bit_BLOCK_template.c:2574:37: warning: implicit declaration of function '_XOR_EPI' [-Wimplicit-function-declaration]
2574 | h1_real = (__SIMD_DATATYPE) _XOR_EPI((__m512i) h1_real, (__m512i) sign);
| ^~~~~~~~
src/elpa2/kernels/complex_128bit_256bit_512bit_BLOCK_template.c:2574:9: error: cannot convert a value of type 'int' to vector type '__vector(8) double' which has different size
2574 | h1_real = (__SIMD_DATATYPE) _XOR_EPI((__m512i) h1_real, (__m512i) sign);
According to 'configure.ac', either 'HAVE_AVX512_XEON' or 'HAVE_AVX512_XEON_PHI' macro should automatically be defined if CPU has an AVX512 support. This works on Intel CPUs, however, on AMD CPUs the configure script ill-defines HAVE_AVX512_XEON_PHI
and leaves HAVE_AVX512_XEON
undefined. This leads to execution of the code path guarded by the #ifdef HAVE_AVX512_XEON_PHI
directive, in which an undefined macro _XOR_EPI
is used. I'm sure that the same error will also appear on Xeon-Phi as the logic similar.
If I understood correctly, the _XOR_EPI
macro should actually be replaced with _SIMD_XOR_EPI
that correctly defines _mm512_xor_epi64
for AVX512 registers.
I'm attaching the patch with the fix to this issue (created from ELPA-2023.05.001
):
ELPA-2023.05.001_fix_AVX512_support.patch
I'm interested in benchmarking elpa's performance on computing the eigen vector of a matrix. I assume this (https://github.com/marekandreas/elpa/blob/master/python/examples/example.py) is the hello world program I should start with? If I want to use GPU, how do I modify this program for cuda matrix input? Thanks!
Building the latest 2022.11.001.rc2
release with --enable-nvidia-gpu=yes
fails with the following error message:
../test/C/test.c:171:10: fatal error: test/shared/GPU/test_gpu_vendor_agnostic_layerVariables.h: No such file or directory
171 | #include "test/shared/GPU/test_gpu_vendor_agnostic_layerVariables.h"
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
With --disable-c-tests --disable-cpp-tests
the build succeeds.
The system environment is CentOS7 and i'm using intel oneapi toolkit and devtoolset-9, so the complier is latest ifortran+gcc9.
As for hardware, the cpu supports avx512 and the GPU is Nvidia A100
It can be compiled successfully without cuda support.
FC=mpiifort CC=mpiicc ../configure FCFLAGS="-O3 -xCORE-AVX512" CFLAGS="-O3 -xCORE-AVX512" --enable-option-checking=fatal SCALAPACK_LDFLAGS=" -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -ldl" SCALAPACK_FCFLAGS=" -I"${MKLROOT}/include"" --enable-avx2 --enable-avx512
However, if I add gpu support:
../configure FC=mpiifort CC=mpicc FCFLAGS="-O3 -xCORE-AVX512" CFLAGS="-O3 -march=skylake-avx512 -mfma -funsafe-loop-optimizations -funsafe-math-optimizations -ftree-vect-loop-version -ftree-vectorize" SCALAPACK_LDFLAGS=" -L${MKLROOT}/lib/intel64 -lmkl_scalapack_lp64 -lmkl_cdft_core -lmkl_blacs_intelmpi_lp64 -liomp5 -lpthread -lm -ldl" SCALAPACK_FCFLAGS=" -qmkl=parallel" --enable-avx2 --enable-avx512 --enable-nvidia-gpu --with-cuda-path=/usr/local/cuda --with-NVIDIA-GPU-compute-capability=sm_80
I got this:
the repositories of your Linux distribution: there exist pre-build packages for a number of Linux distributions like Fedora, Debian, and OpenSuse
I was searching for elpa in the debian distribution packages, but I was not able to find it. What is the package name? Am I supposed to install it with sudo apt-get install pyelpa
? Thanks!
According to the USERS_GUIDE:
The user can change this default value by setting an enviroment variable to the desired value.
The name of this variable is always constructed in the following way:
ELPA_DEFAULT_tunable_parameter_name=value
(By the way environment is misspelled as enviroment.)
However, this doesn't seem to work well with the nvidia-gpu
, amd-gup
, and intel-gpu
variables. Various online resources mentioned that it's not allowed to have a dash in the name of an environment variable. For example see this post on stackoverflow.
So to work this around in bash, something like
env 'ELPA_DEFAULT_nvidia-gpu=1' ./test_elpa.x ...
is needed, instead of the more familar way, export ELPA_DEFAULT_nvidia-gpu=1
.
It would be great if this could be clarified in the USERS_GUIDE, or even better, createnvidia_gpu
, amd_gpu
, and intel_gpu
as aliases to the existing ones.
https://elpa.mpcdf.mpg.de/elpa-tar-archive, mentioned at https://github.com/marekandreas/elpa/blob/master/README.md?plain=1#L66, is not working.
I think the new URL is https://elpa.mpcdf.mpg.de/software/tarball-archive/ELPA_TARBALL_ARCHIVE.html
Currently elpa installation adds
lib/libelpa.so
lib/pkgconfig/elpa-2020.05.001.pc
files.
When I use pkg_search_module
in CMake to look for elpa
, it searches for elpa.pc but doesn't recognize elpa-VERSION.pc
, then I got a chicken and egg problem. I can only search with the version but If I don't find elpa, how do I know the version.
I'd like to see how other people solves this issue.
Or can we just put elpa.pc wthout VERSION.
Hello, Sir/Madam.
The code tells me configuration error about Fortran. they have been attached .
Could you help ?
Thanks.
ELPA is now also available via the conda package manager from the conda-forge channel (https://anaconda.org/conda-forge/elpa) for Linux (x86_64, ppc64le, aarch64) and OSX (x86_64) as serial, threaded and MPI parallel version (both MPICH and OpenMPI).
The feedstock repository is available here. If you are interested in co-maintaining ELPA on conda-forge let me know (maintaining a package on conda-forge is not much work as most tasks are automated by bots).
I changed to the latest ELPA and now this error is reported:
ELPA_SETUP ERROR: your provided blacsgrid is not ok!
BLACS_GRIDINFO returned an error! Aborting...
What exactly is not ok
and how I should fix the blacs grid (4x4 in my case)?
Dear ELPA developers,
We have been successfully using CPU kernels of ELPA in DFT-FE code (https://github.com/dftfeDevelopers/dftfe), where I am one of the lead developers. We use ELPA on a subset of MPI ranks of the MPI_COMM_WORLD used by DFT-FE. In particular, we use MPI_Comm_create_group to create the new communicator which we pass to ELPA. Recently, I have been trying to use the GPU kernels of ELPA (using elpa-2020.11.001.rc1) on Summit supercomputer using the same route, but the code gets stuck after printing:
Initializing the GPU devices
when using subset of the ranks. It works fine if all ranks are used.
Upon investigating the ELPA source code, I found the issue to be the call to mpi_allreduce where MPI_COMM_WORLD is used.
elpa/src/GPU/check_for_gpu.F90
Line 101 in 5bff935
#ifdef WITH_MPI
call mpi_allreduce(numberOfDevices, maxNumberOfDevices, 1, MPI_INTEGER, MPI_MAX, MPI_COMM_WORLD, mpierr)
if (maxNumberOfDevices .ne. numberOfDevices) then
print *,"Different number of GPU devices on MPI tasks!"
print *,"GPUs will NOT be used!"
gpuAvailable = .false.
return
endif
#endif
Since this is a sanity check, I have temporarily bypassed the above issue by commenting the above lines of code. Now I am able to run the GPU-kernel successfully on subset of ranks.
I would be very grateful for any guidance in resolving this issue cleanly either in the way we interfacing with ELPA, or if it could be fixed in the ELPA source code itself.
Thank you,
Sambit
We have built cp2k-9.1 for NVIDIA A100 and installed elpa-2021.11.001 via its toolchain.
we have the following error message when elpa is called from cp2k-9.1.
Initializing the GPU devices
Found 8 GPUs
MPI rank 0 uses GPU #0
ELPA: Warning, GPU usage has been requested but compute kernel is set by the us
er as non-GPU!
The compute kernel will be executed on CPUs!
I notice that this error comes from a conditional branch around L. 796 in src/elpa2/elpa2_template.F90.
It arises if both the following variables are TRUE: WITH_REAL_NVIDIA_SM80_GPU_KERNEL and GPU_KERNEL.
We have both the normal GPU kernel and the kernel for NVIDIA A100 in our executables built via the toolchain of cp2k-9.1, and this seems to be the source of the problem.
For our purpose, it would suffice if we can run the normal GPU kernel (instead of the one for A100).
Is is possible to stop building the new A100 GPU kernel via the configure options ?
The following is our current configure options.
Our system is Intel Xeon Platinum 8360Y (two sockets), equipped with eight A100 GPUs.
The compilers are intel oneAPI compilers (2021.2.0) and cuda 11.2.
../configure --libdir="${pkg_install_dir}/${TARGET}/lib" \
--enable-openmp=yes \
--enable-shared=no \
--enable-static=yes \
${other_kernel_flags} \
--enable-nvidia-gpu=yes \
--with-cuda-path=${CUDA_PATH} \
--with-NVIDIA-GPU-compute-capability=sm_80 \
${other_config_flags}
I appreciate your help on this issue. Thank you in advance.
During make check
of elpa-2023.11.001 on my pc, all complex evp test failed; it seems the fail comes by a n if
in test/shared/test_check_correctness_template.F90
, line 501:
500 if (nev .ge. 2) then
501 if (errmax .gt. tol_res .or. errmax .eq. 0.0_rk) then
502 status = 1
503 endif
504 else
505 if (errmax .gt. tol_res) then
506 status = 1
507 endif
508 endif
The check errmax .eq. 0.0_rk
confuses me. What do this check do, like, zero max-error would do harm in some calculation? Similar check also appears in other files like line 450, test/shared/test_analytic_template.F90
, suggests it is set with purpose.
EDIT: These checks seems come from far old commits like b9bbba2, but with no more info. Maybe deleting this would cause bug?
Hi, I have solve the matrix use 1 step\ 2 step and scalapack way, but get three different resulet. This is a 240*240 symmetry matrix, and compute all eigenvalue. All the test has passed. what could cause this ?
0.173720131800915 0.173878822327725 0.187152350855710
0.191423385819358 0.194181552429425 0.194181552844384
0.194449431187434 0.194449431797715 0.199507996641544
-2.07585731052087 -2.03591585272923 -1.72390402574035
-1.66384803970337 -1.52011422574109 -1.43106927116121
-1.39056098837714 -1.29185171666780 -1.24300500320917
-3.14666685601262 -3.14303290459765 -3.13546185348343
-3.00895984604710 -2.99530541682141 -2.98274897202263
-2.90288228355549 -2.90085399507448 -2.87626101080164
call blacs_pinfo(my_proc, nprocessors)
call blacs_get(0, 0, icontext)
call blacs_gridinit(icontext, 'C', nprow, npcol)
call blacs_gridinfo(Icontext, nprow, npcol, myrow, mycol)
ir = max(1, numroc(s%norbitals, nb, myrow, 0, nprow))
ic = max(1, numroc(s%norbitals, nb, mycol, 0, npcol))
call descinit(desc_x, s%norbitals, s%norbitals, nb, nb, 0, 0, &
& icontext, ir, info)
ICTXT = desca(CTXT_para)
MB =desca(MB_para)
NB = desca(NB_para)
if(MB .ne. NB) stop " not support block size not equal of row and col"
call blacs_gridinfo(ICTXT, NPROW, NPCOL, my_prow, my_pcol)
NLROW = numroc(n, MB, my_prow, 0, NPROW) ! number of rows contained in mine
NLCOL = numroc(n, MB, my_pcol, 0, NPCOL)
bandwidth = MB
if (elpa_init(CURRENT_API_VERSION) /= ELPA_OK) then
print *, "ELPA API version not supported"
stop 1
endif
e => elpa_allocate(error_elpa)
call e%set("na", int(n,kind=c_int), error_elpa)
call e%set("nev", int(nev,kind=c_int), error_elpa)
call e%set("local_nrows", int(NLROW,kind=c_int), error_elpa)
call e%set("local_ncols", int(NLCOL,kind=c_int), error_elpa)
call e%set("nblk", int(MB,kind=c_int), error_elpa)
call e%set("mpi_comm_parent", int(MPI_COMM_WORLD,kind=c_int), error_elpa)
call e%set("process_row", int(my_prow,kind=c_int), error_elpa)
call e%set("process_col", int(my_pcol,kind=c_int), error_elpa)
call e%set("bandwidth", int(bandwidth,kind=c_int), error_elpa)
call e%eigenvectors(a, ev, z, error_elpa) !use environment to decide which solver is used
See ./test-suite.log
Please report to [email protected]
make[3]: *** [Makefile:84658: test-suite.log] Error 1
make[3]: Leaving directory '/home/aarav/wien2k/elpa/build'
make[2]: *** [Makefile:84766: check-TESTS] Error 2
make[2]: Leaving directory '/home/aarav/wien2k/elpa/build'
make[1]: *** [Makefile:90928: check-am] Error 2
make[1]: Leaving directory '/home/aarav/wien2k/elpa/build'
make: *** [Makefile:90930: check] Error 2
Here is the feedback from spack's pull-request spack/spack#33439
You need to add a requirement for AC_PROG_CPP in the autotools
elpa_generated.h
checks if HAVE_SKEWSYMMETRIC
is defined, but the same check is not contained in elpa_generic.h
, causing compilation issues. I have a quick and dirty fix in this commit dmejiar@7390e12, but I am not sure this is the best way to handle the issue.
Dear @marekandreas !
We observe a problem in compilation of recent elpa with CUDA:
test/C/test.c:171:10: fatal error: ../shared/GPU/test_gpu_vendor_agnostic_la
yerVariables.h: No such file or directory
13778 171 | #include "../shared/GPU/test_gpu_vendor_agnostic_layerVariables.h"
13779 | ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
13780 compilation terminated.
>> 13781 make[1]: *** [Makefile:79038: test/C/validate_c_version_complex_double_eigenvalues_1stage_gpu_analytic_explicit-test.o] Error 1
We use spack to compile elpa. The spec is simply "elpa%gcc +cuda cuda_arch=80"
Unless I'm missing something, --enable-gpu-streams=nvidia
requires --enable-nvidia-gpu
, which is however deprecated. Using the new variable --enable-nvidia-gpu-kernels
results in
>> 414 configure: error: If --enable-gpu-streams=nvidia is set, you must also use --enable-nvidia-gpu
PS: Is this a mirror of https://gitlab.mpcdf.mpg.de/elpa/elpa? I could not open an issue in the original repository.
Hi
I am getting a deadlock when running on 16 MPI processes the H2O-32-RI-dRPA-TZ.inp case
I start like this
MPI_PER_GPU=2 mpirun --bind-to none -n 16 binder.sh ../../../exe/local_cuda/cp2k.psmp -i H2O-32-RI-dRPA-TZ.inp ./exe/local_cuda/cp2k.psmp -i H2O-32-RI-dRPA-TZ.inp
Very quickly the program will hangs after
p coordinates 3 0.000 0.000 0.6
p buffer 3 0.000 0.000 0.6
p layout 3 0.000 0.000 0.2
p allocation 2 0.000 0.000 0.0
p init 2 0.000 0.000 0.1
from the gdb stack you can see that 16 are calling
gdb_1431574.out:#12 0x00007f0d96a1a74f in __elpa2_impl_MOD_elpa_solve_evp_real_2stage_double_impl () from /opt/elpa/lib/libelpa_openmp.so.15
You can see that 4 stacks go here
The first 4 go here
grep __elpa2_compute_MOD_bandred_real_double *.out
gdb_1431574.out:#11 0x00007f0d969f8e7b in __elpa2_compute_MOD_bandred_real_double ()
gdb_1431575.out:#14 0x00007fe14a904325 in __elpa2_compute_MOD_bandred_real_double ()
gdb_1431577.out:#10 0x00007f9bbc664325 in __elpa2_compute_MOD_bandred_real_double ()
gdb_1431581.out:#14 0x00007f2884ddd325 in
and go in a mpi reduction ..
the others call directly mod_check_for_gpu_MOD_check_for_gpu
while the others go here
#10 0x00007f40d09dcbbd in ompi_allreduce_f (sendbuf=0x7ffe6a2269d8 "\001",
recvbuf=0x7ffe6a2265ec "\001", count=0x7f40fb7a9d00,
datatype=, op=0x7f40fb7a9d00, comm=,
ierr=0x7ffe6a2265e8) at pallreduce_f.c:87
#11 0x00007f40fb724503 in __mod_check_for_gpu_MOD_check_for_gpu ()
from /opt/elpa/lib/libelpa_openmp.so.15
#12 0x00007f40fb7419f7 in __elpa2_impl_MOD_elpa_solve_evp_real_2stage_double_impl () from /opt/elpa/lib/libelpa_openmp.so.15
#13 0x00007f40fb6a06f7 in __elpa_impl_MOD_elpa_eigenvectors_d ()
To summarize ... 12 are already in the PMPI_Allreduce while 4 are still doing something else ..
I hope this may give you some guidance to solve this bug ...
Please do not hesitate to contact me directly at Gunter Roth [email protected]
It would be a complete pleasure to complete any missing information .. and thanks again for all your ELPA efforts ..
Gunter
Also attaching my summary file debug_H2O-32-RI-dRPA-TZ.txt ..
Hi,
Is it possible to use ELPA
for small matrices when the number of mpi processes n_procs is larger than the number of elements in the matrix?
For practical purposes I would sometimes like to use more processes than the size of my matrix.
Currently, we can use ELPA
or ScaLAPACK
to diagonalise all of the matrices in our code. The first one can be very small but the second is typically 10,000 x 10,000.
For example, if the first matrix is 2x2 and I use 8 or more processes then calculations fails using ELPA. But if I use ScaLAPACK it will run.
For practical reasons I cannot share the code but both ScaLAPACK and ELPA runs use the same grid setup using BLACS_Gridinit
and descinit
etc.
Is there a simple explanation to this?
The errors/warning I get for the 2x2 matrix with 8 procs are:
ELPA: Warning, block size too large for this matrix size and process grid!
Choose a smaller block size if possible.
All the success
points are checked, including elpa%setup() and it fails when the success
returned by elpa%eigenvectors is not equal to elpa_OK
.
As for a 1x1 matrix (running with 2 procs) I get a different error:
ELPA_SETUP ERROR: your provided blacsgrid is not ok!
BLACS_GRIDINFO returned an error! Aborting...
I know it may seem stupid to use ELPA for 1x1 matrix but it is more that the code is structured to use 1 diagonaliser for everything. Using ScaLAPACK we can do this but it appears ELPA has different criteria.
I can possibly implement a workaround to deal with small matrices but I'd prefer to make minimal changes.
Elpa version is: 2021.05.002
I use tried two intel compilers: 19.0.0.117 20180804
and 2021.4.0 20210910
Sorry if this is too vague. I guess my main question is not how do you specifically fix my problem but is it in principle possible to use ELPA for small matrices when the number of mpi processes n_procs is large.
Hi @marekandreas !
It is probably an issue with the header guards. To reproduce:
a.cpp:
#include <elpa/elpa.h>
int main()
{
return 0;
}
b.cpp:
#include <elpa/elpa.h>
void foo()
{
}
gcc a.cpp b.cpp -I/path/to/elpa/include/elpa_openmp-2022.11.001.rc2/ -L/path/to/elpa/elpa/lib -lelpa_openmp
leads to
/usr/bin/ld: /tmp/ccK00fbG.o: in function `elpa_set(elpa_struct*, char const*, int, int*)':
b.cpp:(.text+0x0): multiple definition of `elpa_set(elpa_struct*, char const*, int, int*)'; /tmp/cc51CaRG.o:a.cpp:(.text+0x0): first defined here
/usr/bin/ld: /tmp/ccK00fbG.o: in function `elpa_set(elpa_struct*, char const*, double, int*)':
b.cpp:(.text+0x35): multiple definition of `elpa_set(elpa_struct*, char const*, double, int*)'; /tmp/cc51CaRG.o:a.cpp:(.text+0x35): first defined here
/usr/bin/ld: /tmp/ccK00fbG.o: in function `elpa_get(elpa_struct*, char const*, int*, int*)':
b.cpp:(.text+0x75): multiple definition of `elpa_get(elpa_struct*, char const*, int*, int*)'; /tmp/cc51CaRG.o:a.cpp:(.text+0x75): first defined here
/usr/bin/ld: /tmp/ccK00fbG.o: in function `elpa_get(elpa_struct*, char const*, double*, int*)':
b.cpp:(.text+0xac): multiple definition of `elpa_get(elpa_struct*, char const*, double*, int*)'; /tmp/cc51CaRG.o
...
Hi,
I have observed that CP2K deadlocks with certain rank counts when running with the ELPA backend. I don't know ELPA very well, but with the debugger I think I could gather enough info to pinpoint the problem.
This is what happens in the run, I run with 8 MPI ranks.
check_for_gpu()
#442.check_for_gpu()
#507. As the first 4 ranks were already initialized, they have exited the function earlier #453, and do not reach the Allreduce, so the last 4 ranks will hang there forever.I don't know whether calling ELPA with different comm sizes is allowed or not? But my first thought would be that check_for_gpu()
should first query the value of all rank's gpuIsInitialized
and restart the initialization for everyone if any of the ranks was not initialized, this seems to fix the deadlock in my case.
Hi,
I know it may seem as a novice question but I am having difficulties running the C tests.
If I am not mistaken I need to compile the tests inside the /test/C folder right? how do I compile them?
I have installed elpa using https://xconfigure.readthedocs.io/en/latest/elpa/
I would really love the help. thanks in advance!
Hello,
I am trying to compile elpa-2021.11.002 on my HPC cluster. I have the following modules loaded:
Running make, the program compiles for a while, and eventually stops with this error:
GEN libelpa.la
ld: cannot find -ludev
make[1]: *** [libelpa.la] Error 1
make[1]: Leaving directory `/net/fs2k02/srv/export/kaxiras/share_root/dbennett/elpa-2021.11.002/build'
make: *** [all] Error 2
I tried configuring using the instructions here, here. Also, I tried just simply running configure, specifying the compilers and linking to scalapack following the instructions in INSTALL.md:
FC=mpifort CC=mpicc ../configure \
SCALAPACK_LDFLAGS="-L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread -lm -Wl,-rpath,$MKL_HOME/lib/intel64" \
SCALAPACK_FCFLAGS="-L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64
-lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64
-lpthread -lm -I$MKL_HOME/include/intel64/lp64"
But both give the same error. I couldn't find any reference to libudev in the code, and not really sure why it's being called.
Any advice much appreciated
Log files and configure script attached here
config.txt
configure-elpa-skx.txt
make.txt
Thanks,
Daniel Bennett
Dear @marekandreas !
I'm running latest release of ELPA on GPUs and I'm getting this error message
Error in executing copy_double_complex_a_tmatc_kernel: invalid configuration argument
However it seems to be harmless as the calculation runs fine and all the data seems to be properly copied. This happens on 4x4 and 2x2 BLACS grids and different matrix sizes.
With kind regards,
Anton.
Hi, I compile ELPA with AMD GPU and MPI but meets following two error, how to resolve this? Thank you!
PPFC src/helpers/libelpa_private_la-mod_precision.lo
PPFC src/helpers/libelpa_private_la-mod_omp.lo
PPFC src/GPU/CUDA/libelpa_private_la-mod_cuda.lo
PPFC src/GPU/ROCm/libelpa_private_la-mod_hip.lo
PPFC src/libelpa_private_la-elpa_generated_fortran_interfaces.lo
PPFC src/libelpa_public_la-elpa_constants.lo
./src/fortran_constants.F90:2:35:
2 | integer(kind=C_INT), parameter ::
| 1
Error: Invalid character in name at (1)
./src/fortran_constants.F90:4:35:
... and more. All fortran_constants.F90 file like this.
and
../src/elpa_constants.F90:57:99:
57 | integer(kind=C_INT), parameter :: ELPA_2STAGE_REAL_GPU = ELPA_2STAGE_REAL_NVIDIA_GPU
| 1
Error: Symbol 'elpa_2stage_real_nvidia_gpu' at (1) has no IMPLICIT type; did you mean 'elpa_2stage_real_gpu'?
../src/elpa_constants.F90:58:102:
58 | integer(kind=C_INT), parameter :: ELPA_2STAGE_COMPLEX_GPU = ELPA_2STAGE_COMPLEX_NVIDIA_GPU
| 1
Error: Symbol 'elpa_2stage_complex_nvidia_gpu' at (1) has no IMPLICIT type; did you mean 'elpa_2stage_complex_gpu'?
make[1]: *** [Makefile:75736: src/libelpa_public_la-elpa_constants.lo] Error 1
make[1]: Leaving directory '/work1/jrf/tool/elpa/build-mpi-gpu'
My configure is gcc (GCC) 11.2.1/ rocm4.3.0/intel mpi 2021
FC=mpif90 CC=mpicc CXX=hipcc ../configure \
CPP="gcc -E" \
FCFLAGS="-g -O3 " \
CXXFLAGS=" -O3 -DROCBLAS_V3 -D__HIP_PLATFORM_AMD__ --offload-arch=gfx90a -g -O3 -std=c++17 " \
CFLAGS="-O3 -g -O3 -std=c++17 " \
--disable-mpi-module \
--enable-option-checking=fatal \
LIBS="-L$ROC_HOME/lib -Wl,-rpath=$ROC_HOME/lib -L$ROC_HOME/hip/lib -Wl,-rpath=$ROC_HOME/hip/lib -lamdhip64 -fPIC -lrocblas" \
--with-mpi=yes --disable-sse --disable-sse-assembly --disable-avx --disable-avx2 --disable-avx512 \
--enable-amd-gpu --enable-single-precision --enable-gpu-streams=amd --enable-hipcub --disable-cpp-tests --with-rocsolver \
SCALAPACK_LDFLAGS=" -L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread " \
SCALAPACK_FCFLAGS="-I$MKL_HOME/include -I$MKL_HOME/include/intel64/lp64 -L$MKL_HOME/lib/intel64 -lmkl_scalapack_lp64 -lmkl_intel_lp64 -lmkl_sequential -lmkl_core -lmkl_blacs_intelmpi_lp64 -lpthread "
The latest 2022.11.001.rc2
release with --enable-nvidia-gpu=yes
has a memory leak:
=================================================================
==1088853==ERROR: LeakSanitizer: detected memory leaks
Direct leak of 8 byte(s) in 1 object(s) allocated from:
#0 0x7fb6bfa2c302 in __interceptor_malloc ../../../../src/libsanitizer/lsan/lsan_interceptors.cpp:75
#1 0x562628e6b2e1 in cublasCreateFromC (/home/ole/git/cp2k/exe/local_cuda/cp2k.pdbg+0x3dff2e1)
#2 0x562628e6d0e7 in __mod_check_for_gpu_MOD_check_for_gpu ../src/GPU/./handle_creation_template.F90:25
#3 0x562628eb7ac1 in __elpa2_impl_MOD_elpa_solve_evp_real_2stage_a_h_a_double_impl ../src/elpa2/elpa2_template.F90:422
#4 0x562628e2995a in __elpa_impl_MOD_elpa_eigenvectors_a_h_a_d ../src/elpa_impl_math_solvers_template.F90:126
#5 0x562627c75aef in cp_fm_diag_elpa_base /home/ole/git/cp2k/src/fm/cp_fm_elpa.F:537
...
I compiled ELPA-2023.05.001 on my computer with a AMD MI50, and found the program fails all tests about cholesky decomposition on gpu with segmentation fault. Along the backtrace, it seems that the problem is caused by:
117 if (.not. is_already_decomposed) then 118 ! B = U^T*U, B<-U 119 call self%elpa_cholesky_a_h_a_& 120 &ELPA_IMPL_SUFFIX& 121 &(b, error)
157 call obj%get("gpu_cholesky",gpu_cholesky, error) 158 if (error .ne. ELPA_OK) then 159 write(error_unit,*) "ELPA_CHOLESKY: Problem getting option for gpu_cholesky. Aborting..." 160 success = .false. 161 return 162 endif 163 164 if (gpu_cholesky .eq. 1) then 165 useGPU = (gpu == 1) 166 else 167 useGPU = .false. 168 endif
Would it be OK to simply add gpu-using check in either file, or there are some further work on gpusolver in progress? Options about gpusolver do not appear on INSTALL.md but do show in configure --help, is it useable for other routines(e.g. eigenvector) now?
All outputs and call stack in the log file.
validate_c_version_complex_double_generalized_1stage_gpu_random_default.sh.log
Source the script below, make, and make check.
Using ROCmCC instead of hipcc to compile kernels on cpu written in C. It seems CXX could be simply replaced with hipcc to avoid editting Makefile, but not tested yet.
In file elpa_impl.F90
the check for MPI thread level support looks at the moment
if ((providedMPI .ne. MPI_THREAD_SERIALIZED) .and. (providedMPI .ne. MPI_THREAD_MULTIPLE)) then
#if defined(ALLOW_THREAD_LIMITING)
write(error_unit,*) "WARNING elpa_setup: MPI threading level MPI_THREAD_SERALIZED or MPI_THREAD_MULTIPLE required but &
...
I think the check should be
if ((providedMPI .ne. MPI_THREAD_SERIALIZED) .or. (providedMPI .ne. MPI_THREAD_MULTIPLE)) then
or, as the levels are ordered
if (providedMPI .lt. MPI_THREAD_SERIALIZED) then
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.