Comments (17)
what is your complier?
from blasfeo.
gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-4)
from blasfeo.
Apparently you need at least gcc 4.7 for that.
You can try to replace -mavx2 with -mavx here
https://github.com/giaf/blasfeo/blob/master/Makefile.rule#L121
(and possibly remove -mfma if it complains about), and see if you can still use the assembly kernels optimized for Haswell.
Otherwise, either you use a more recent compiler, or a older target (losing some performance).
Let me know how it goes :)
from blasfeo.
Thank you, sorry for my ignorance :/ - I'd rather update gcc. Updates soon
from blasfeo.
Looks like I need a newer Red Hat distribution to use gcc 4.8; While I'm downloading the newer version, I tried replacing -mavx2 to -mavx and it didn't work. Also, tried removing -mfma and same error (assembler errors, it seems). Curiously, if I use INTEL_CORE it works. Could it be an error in architecture selection?
from blasfeo.
By Intel Core architecture (not the Core brand name for Core i7, i5, ...), I mean this
https://en.wikipedia.org/wiki/Intel_Core_(microarchitecture)
that is a rather old architecture.
In BLASFEO, the code for this target uses instructions up to SSE4
https://en.wikipedia.org/wiki/SSE4
that in double precision can perform 1 multiplication and 1 addition of 2-wide vectors per clock cycle.
Intel Sandy Bridges introduces the AVX instruction set
https://en.wikipedia.org/wiki/Advanced_Vector_Extensions
that in double precision can perform 1 multiplication and 1 addition of 4-wide vectors per clock cycle.
Intel Haswell introduces the AVX2 and FMA3 instruction sets
https://en.wikipedia.org/wiki/Multiply%E2%80%93accumulate_operation#Fused_multiply.E2.80.93add
that in double precision can perform 2 fused-multuply-add of 4-wide vectors per clock cycle.
Each new computer architecture supports the old instructions, plus possibly additions.
So, using an older target (as e.g. Core or Sandy Bridge) works for you, but you are not fully exploiting the new instruction sets.
Old compilers are not aware of recent instruction sets, so even if you have an Haswell processor, the gcc 4.4.7 compiler can not generate code with AVX2 and FMA instructions.
In any case, even with an old linux distro you can always install more recent versions of gcc from source.
from blasfeo.
Thank you. Yeah, so in my case it is Haswell indeed.
I'm a very poor Linux user, so I keep trying to figure things out from forums. I'll give it one more try on installing newer versions from source as you suggested. Sorry to keep giving you trouble.
from blasfeo.
No problem, I'm glad to help.
Otherwise you can simply use the Sandy Bridge target, it's not the best one, but the speed up using the Haswell target is always less than 2x.
from blasfeo.
Sandy Bridge works fine too. I'll work on the newer gcc and report back if I'm successful with Haswell then.
from blasfeo.
GCC updated to 4.8 and BLASFEO now compiles with Haswell target!
Thank you so much!
from blasfeo.
@giaf , I think the ISA tests aren't good enough.
They only check if the processor is capable of the ISA (Which you could do with cpu_id()
).
You should also check if the correct headers are set.
So at least add an ISA command like: _mm256_loadu_pd
or _mm_loadu_pd
. It is better to selects commands for AVX2
as well (Something with integer).
from blasfeo.
They only check if the processor is capable of the ISA (Which you could do with cpu_id())
This is provided as a runtime function blasfeo_processor_cpu_features
the user of the library can call to see if the current processor supports the compiled version.
So at least add an ISA command like: _mm256_loadu_pd or _mm_loadu_pd.
The ISA tests directly compile the assembly mnemonics to test for support of the requested architecture, since the BLASFEO source uses assembly for the target-specific kernels instead of intrinsics. I don't think it uses any intrinsics in the code, so it shouldn't need to include special headers, but @giaf would know better than I about that.
from blasfeo.
That's not the case.
For instance, try to compile the project on Skylake without the flag -mavx2
. You will get an error.
The current CMAKELists.txt
adds the flags but it is better the tests will test the correct flags are indeed added and effective.
from blasfeo.
Hmm, there are some AVX intrinsics in the code apparently. I had thought everything was done in pure assembly. It should be easy for me to add those to the ISA tests though.
from blasfeo.
Yes some easy stuff like BLAS 1 routines is vectorized with intrinsics instead of pure assembly.
@imciner2 great if you can do that!
BTW your ISA tests turned out to be a great feature to use BLASFEO in acados :)
from blasfeo.
BTW I noticed that on some ARM architectures you still need to enable NEON with an assembler flag otherwise the assembler would complain. But in this case no headers are needed.
from blasfeo.
great if you can do that!
Done in PR #122 for the X86 intrinsics (which are the only ones used in the code currently).
BTW your ISA tests turned out to be a great feature to use BLASFEO in acados :)
Great to hear. That was the main reason I developed them, since I remember seeing the issues just setting a high default caused for some people's computers when installing it.
BTW I noticed that on some ARM architectures you still need to enable NEON with an assembler flag otherwise the assembler would complain. But in this case no headers are needed.
Yea, some ARM architectures get really picky about the flags needed to compile unfortunately.
from blasfeo.
Related Issues (20)
- warning: using integer absolute value function 'abs' when argument is of floating point type HOT 1
- Please explain in the README how to run tests HOT 1
- blas_ API: for sgemm of armv8a, only 4x4 microkernel can be used? HOT 9
- Does code size affect the performance of small GEMMs?
- BLASFEO_PROCESSOR_FEATURES as identifier instead of object HOT 3
- SIGSEGV using hpipm HOT 1
- Bug in blasfeo drowpe?
- Problem with blasfeo_drowpe? HOT 4
- Error on multiple definition of `BLASFEO_PROCESSOR_FEATURES'
- Need to link against math library? HOT 1
- Linker error: SHF_MERGE section size (456) must be a multiple of sh_entsize (32) HOT 4
- Tests fail to build: libblasfeo.so: undefined reference to kernel_dpack_buffer_fn HOT 1
- Are there routines for matrix norms? HOT 2
- Incorrect documentation for dtrmm in blasfeo_d_blasfeo_api.h? HOT 3
- blasfeo_dtrmm_rltn not implemented HOT 1
- Missing symbols kernel_dpack_buffer_* in the shared library HOT 2
- Tests fail: error: undefined symbol: blasfeo_sgemm HOT 1
- When can we use parameter as both input and output? HOT 1
- What are m, n, k in dgemm routines? HOT 1
- blasfeo_target.h:1:0: error: unterminated #ifndef HOT 2
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from blasfeo.