Giter VIP home page Giter VIP logo

Comments (17)

giaf avatar giaf commented on July 29, 2024

what is your complier?

from blasfeo.

ggleizer avatar ggleizer commented on July 29, 2024

gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)

from blasfeo.

giaf avatar giaf commented on July 29, 2024

Apparently you need at least gcc 4.7 for that.

You can try to replace -mavx2 with -mavx here
https://github.com/giaf/blasfeo/blob/master/Makefile.rule#L121
(and possibly remove -mfma if it complains about), and see if you can still use the assembly kernels optimized for Haswell.

Otherwise, either you use a more recent compiler, or a older target (losing some performance).

Let me know how it goes :)

from blasfeo.

ggleizer avatar ggleizer commented on July 29, 2024

Thank you, sorry for my ignorance :/ - I'd rather update gcc. Updates soon

from blasfeo.

ggleizer avatar ggleizer commented on July 29, 2024

Looks like I need a newer Red Hat distribution to use gcc 4.8; While I'm downloading the newer version, I tried replacing -mavx2 to -mavx and it didn't work. Also, tried removing -mfma and same error (assembler errors, it seems). Curiously, if I use INTEL_CORE it works. Could it be an error in architecture selection?

from blasfeo.

giaf avatar giaf commented on July 29, 2024

By Intel Core architecture (not the Core brand name for Core i7, i5, ...), I mean this
https://en.wikipedia.org/wiki/Intel_Core_(microarchitecture)
that is a rather old architecture.
In BLASFEO, the code for this target uses instructions up to SSE4
https://en.wikipedia.org/wiki/SSE4
that in double precision can perform 1 multiplication and 1 addition of 2-wide vectors per clock cycle.

Intel Sandy Bridges introduces the AVX instruction set
https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
that in double precision can perform 1 multiplication and 1 addition of 4-wide vectors per clock cycle.

Intel Haswell introduces the AVX2 and FMA3 instruction sets
https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation#Fused_multiply.E2.80.93add
that in double precision can perform 2 fused-multuply-add of 4-wide vectors per clock cycle.

Each new computer architecture supports the old instructions, plus possibly additions.
So, using an older target (as e.g. Core or Sandy Bridge) works for you, but you are not fully exploiting the new instruction sets.

Old compilers are not aware of recent instruction sets, so even if you have an Haswell processor, the gcc 4.4.7 compiler can not generate code with AVX2 and FMA instructions.

In any case, even with an old linux distro you can always install more recent versions of gcc from source.

from blasfeo.

ggleizer avatar ggleizer commented on July 29, 2024

Thank you. Yeah, so in my case it is Haswell indeed.

I'm a very poor Linux user, so I keep trying to figure things out from forums. I'll give it one more try on installing newer versions from source as you suggested. Sorry to keep giving you trouble.

from blasfeo.

giaf avatar giaf commented on July 29, 2024

No problem, I'm glad to help.

Otherwise you can simply use the Sandy Bridge target, it's not the best one, but the speed up using the Haswell target is always less than 2x.

from blasfeo.

ggleizer avatar ggleizer commented on July 29, 2024

Sandy Bridge works fine too. I'll work on the newer gcc and report back if I'm successful with Haswell then.

from blasfeo.

ggleizer avatar ggleizer commented on July 29, 2024

GCC updated to 4.8 and BLASFEO now compiles with Haswell target!

Thank you so much!

from blasfeo.

RoyiAvital avatar RoyiAvital commented on July 29, 2024

@giaf , I think the ISA tests aren't good enough.
They only check if the processor is capable of the ISA (Which you could do with cpu_id()).
You should also check if the correct headers are set.
So at least add an ISA command like: _mm256_loadu_pd or _mm_loadu_pd. It is better to selects commands for AVX2 as well (Something with integer).

from blasfeo.

imciner2 avatar imciner2 commented on July 29, 2024

They only check if the processor is capable of the ISA (Which you could do with cpu_id())

This is provided as a runtime function blasfeo_processor_cpu_features the user of the library can call to see if the current processor supports the compiled version.

So at least add an ISA command like: _mm256_loadu_pd or _mm_loadu_pd.

The ISA tests directly compile the assembly mnemonics to test for support of the requested architecture, since the BLASFEO source uses assembly for the target-specific kernels instead of intrinsics. I don't think it uses any intrinsics in the code, so it shouldn't need to include special headers, but @giaf would know better than I about that.

from blasfeo.

RoyiAvital avatar RoyiAvital commented on July 29, 2024

That's not the case.
For instance, try to compile the project on Skylake without the flag -mavx2. You will get an error.

The current CMAKELists.txt adds the flags but it is better the tests will test the correct flags are indeed added and effective.

from blasfeo.

imciner2 avatar imciner2 commented on July 29, 2024

Hmm, there are some AVX intrinsics in the code apparently. I had thought everything was done in pure assembly. It should be easy for me to add those to the ISA tests though.

from blasfeo.

giaf avatar giaf commented on July 29, 2024

Yes some easy stuff like BLAS 1 routines is vectorized with intrinsics instead of pure assembly.

@imciner2 great if you can do that!
BTW your ISA tests turned out to be a great feature to use BLASFEO in acados :)

from blasfeo.

giaf avatar giaf commented on July 29, 2024

BTW I noticed that on some ARM architectures you still need to enable NEON with an assembler flag otherwise the assembler would complain. But in this case no headers are needed.

from blasfeo.

imciner2 avatar imciner2 commented on July 29, 2024

great if you can do that!

Done in PR #122 for the X86 intrinsics (which are the only ones used in the code currently).

BTW your ISA tests turned out to be a great feature to use BLASFEO in acados :)

Great to hear. That was the main reason I developed them, since I remember seeing the issues just setting a high default caused for some people's computers when installing it.

BTW I noticed that on some ARM architectures you still need to enable NEON with an assembler flag otherwise the assembler would complain. But in this case no headers are needed.

Yea, some ARM architectures get really picky about the flags needed to compile unfortunately.

from blasfeo.

Related Issues (20)

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.