Giter VIP home page Giter VIP logo

Comments (7)

terminationshock avatar terminationshock commented on August 13, 2024 1

I tested compiling with different march settings with the GCC 13.1 compiler (the one I had easily available on that AMD system):
I can indeed reproduce the problem with -march=knl and -march=knm. However, I do not see it with -march=skylake-avx512 or (of course) -march=znver4.
@maxim-masterov could you please tell us what you set for -march in your case? Could you try with -march=skylake-avx512 with your GCC 11.3?

from elpa.

terminationshock avatar terminationshock commented on August 13, 2024

Hi @maxim-masterov ,

on which AMD CPU are you trying to compile ELPA and which compiler do you use?
We do not see this issue on AMD EPYC 9654 with the Intel compiler.

from elpa.

maxim-masterov avatar maxim-masterov commented on August 13, 2024

Hi @terminationshock ,
The CPU is AMD EPYC 9654 96-Core Processor and the compiler I use is GCC-11.3.0.

The error occurs because GCC11 doesn't support zen4 architecture. As a result, the evaluation of existence of AVX512 instructions by the configure.ac script results in the following output:

checking whether we compile for Xeon... no
checking whether we compile for Xeon PHI... yes

This leads to execution of this branch in the configure.ac script:

elpa/configure.ac

Lines 2430 to 2435 in c394aed

else
if test x"$can_compile_avx512_xeon_phi" = x"yes"; then
AC_DEFINE([HAVE_AVX512_XEON_PHI],[1],[AVX512 for Xeon-PHI is supported on this CPU])
else
AC_MSG_ERROR([Oho! We can neither compile AVX512 intrinsics for Xeon nor Xeon Phi. This should not happen!])
fi

which defines the HAVE_AVX512_XEON_PHI variable. As a result, the following part of the source code is getting executed:
h1_real = (__SIMD_DATATYPE) _XOR_EPI((__m512i) h1_real, (__m512i) sign);
h1_imag = (__SIMD_DATATYPE) _XOR_EPI((__m512i) h1_imag, (__m512i) sign);

The _XOR_EPI macro is undefined (or I haven't found a place with its definition). So, the above snippet of code is getting converted to a casting of an integer into a packed double instead of invocation of the XOR operation.

from elpa.

maxim-masterov avatar maxim-masterov commented on August 13, 2024

I think something happened between the two releases 2020.05.001 and 2020.11.001

In 2020.05.001 the XOR operation looks correct:

#ifdef DOUBLE_PRECISION_COMPLEX
h1_real = (__SIMD_DATATYPE) _SIMD_XOR_EPI((__m512i) h1_real, (__m512i) sign);
h1_imag = (__SIMD_DATATYPE) _SIMD_XOR_EPI((__m512i) h1_imag, (__m512i) sign);
#endif

whereas in 2020.11.001 the _SIMD prefix is removed:
#ifdef DOUBLE_PRECISION_COMPLEX
h1_real = (__SIMD_DATATYPE) _XOR_EPI((__m512i) h1_real, (__m512i) sign);
h1_imag = (__SIMD_DATATYPE) _XOR_EPI((__m512i) h1_imag, (__m512i) sign);
#endif

from elpa.

maxim-masterov avatar maxim-masterov commented on August 13, 2024

I used -march=znver2. I can try -march=skylake-avx512, but I don't have access to Genoa CPUs at the moment. Should get it back next week. I will report as soon as I get some results.

Although, -march=skylake-avx512 (I'm sure) will help building the code, it won't resolve the problem with an undefined macro, right?

from elpa.

terminationshock avatar terminationshock commented on August 13, 2024

Yes, we will look at the macro anyway.

However, I am a bit puzzled that the configure detects AVX-512 instructions anyhow with -march=znver2. In the gcc 11 manpage, it does not list AVX-512 for this flag:

znver2
               AMD Family 17h core based CPUs with x86-64 instruction set support. (This supersets BMI, BMI2, CLWB, F16C, FMA, FSGSBASE, AVX, AVX2, ADCX, RDSEED, MWAITX, SHA, CLZERO, AES, PCLMUL, CX16,
               MOVBE, MMX, SSE, SSE2, SSE3, SSE4A, SSSE3, SSE4.1, SSE4.2, ABM, XSAVEC, XSAVES, CLFLUSHOPT, POPCNT, RDPID, WBNOINVD, and 64-bit instruction set extensions.)

from elpa.

maxim-masterov avatar maxim-masterov commented on August 13, 2024

@terminationshock Just to confirm, the -march=skylake-avx512 indeed allowed the build to finish

from elpa.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.