flame / blis Goto Github PK
View Code? Open in Web Editor NEWBLAS-like Library Instantiation Software Framework
License: Other
BLAS-like Library Instantiation Software Framework
License: Other
Title was: Build failure on piledriver
BLIS commit a41e68e
OS: Ubuntu 12.04 (on Travis CI)
CPU: AMD Opteron 6376
Compiler: GCC 4.6.3
Short version of the error message is:
Compiling frame/0/setsc/bli_setsc_check.c
frame/0/getsc/bli_getsc_check.c:1:0: error: bad value (bdver2) for -march= switch
(repeated several times for other files in frame/0
)
Full log is at https://travis-ci.org/tkelman/BLIS.jl/jobs/30217818
While perhaps innocuous, such compiler warnings are not ideal...
frame/util/norm1m/bli_norm1m_unb_var1.c: In function 'bli_znorm1m_unb_var1':
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'ij0' may be used uninitialized in this function
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'n_shift' may be used uninitialized in this function
frame/util/norm1m/bli_norm1m_unb_var1.c: In function 'bli_cnorm1m_unb_var1':
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'ij0' may be used uninitialized in this function
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'n_shift' may be used uninitialized in this function
frame/util/norm1m/bli_norm1m_unb_var1.c: In function 'bli_dnorm1m_unb_var1':
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'ij0' may be used uninitialized in this function
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'n_shift' may be used uninitialized in this function
frame/util/norm1m/bli_norm1m_unb_var1.c: In function 'bli_snorm1m_unb_var1':
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'ij0' may be used uninitialized in this function
frame/util/norm1m/bli_norm1m_unb_var1.c:230: warning: 'n_shift' may be used uninitialized in this function
The ???axpys
family of macros could give the incorrect result for alpha
and x
complex and y
real. For example, zzdaxpys
calls daxpyris
which computes yr += ar * xr
instead of the correct result yr += ar * xr - ai * xi
.
One way to fix this is to drop all of the scalar macros and just use C99 complex operators like normal people.
I got another error when I attempt to compile the latest version (0.2) for BG/Q
(1) Undeclared identifier bli_daxpyf_fusefac
Compiling ../config/bgq/kernels/1f/bli_axpyf_opt_var1.c (NOTE: using flags for kernels)
"../config/bgq/kernels/1f/bli_axpyf_opt_var1.c", line 57.20: 1506-045 (S) Undeclared identifier bli_daxpyf_fusefac.
"../config/bgq/kernels/1f/bli_axpyf_opt_var1.c", line 64.39: 1506-098 (E) Missing argument(s).
make: *** [obj/bgq/config/kernels/1f/bli_axpyf_opt_var1.o] Error 1
in file ./config/bgq/kernels/1f/bli_axpyf_opt_var1.c, line 57
if ( b_n < PASTEMAC(d,axpyf_fusefac) || inca != 1 || incx != 1 || incy != 1 || bli_is_unaligned_to( a, 32 ) || bli_is_unaligned_to( y, 32 ) )
use_ref = TRUE;
I cannot find where "axpyf_fusefac" is, so I simply comment out this line to call the reference DAXPYF function
(2) More error in bli_gemm_int_8x8.c
But when I commented out the line 57, I got a bunch of error messages in bli_gemm_int_8x8.c:
Compiling ../config/bgq/kernels/3/bli_gemm_int_8x8.c (NOTE: using flags for kernels)
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 133.28: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 134.28: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
...
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 231.28: 1506-045 (S) Undeclared identifier c_z.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 262.14: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 263.14: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 264.14: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 265.14: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 267.14: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 268.14: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 302.19: 1506-196 (S) Initialization between types "double" and "struct {...}" is not allowed.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 303.19: 1506-196 (S) Initialization between types "double" and "struct {...}" is not allowed.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 304.18: 1506-196 (S) Initialization between types "double" and "struct {...}" is not allowed.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 305.18: 1506-196 (S) Initialization between types "double" and "struct {...}" is not allowed.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 362.5: 1506-068 (S) Operation between types "double" and "struct {...}" is not allowed.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 362.5: 1506-068 (S) Operation between types "double" and "struct {...}" is not allowed.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 362.5: 1506-068 (S) Operation between types "double" and "struct {...}" is not allowed.
"../config/bgq/kernels/3/bli_gemm_int_8x8.c", line 362.5: 1506-068 (S) Operation between types "double" and "struct {...}" is not allowed.
...
Hi, I'm trying to compile BLIS on my server, it has a Xeon E5-2620v2 core and I want to use my ICC 2016. I used the following command to configure:
./configure CC=icc sandybridge
But when I tried to make it, it shows that:
config/sandybridge/make_defs.mk:84: *** gcc is required for this configuration.. Stop.
It seems that the file in configure/sandybridge did not change.
Is there anything wrong in my command? Please help, much thanks!
Due to the current implementation of libblis_test_mobj_create()
, tests with column-stored matrix operands cause those operands to be created with aligned leading dimensions. However, when row storage is tested, matrix operands are created with leading dimensions that are NOT aligned. This is because bli_obj_create()
applies alignment to the default storage case (when "0, 0" is passed in for rs, cs), which currently is column storage. However, when explicit strides are passed in, such as is necessary in order to request row storage, alignment is not applied.
Proposed solution: Add a new parameter to input.general
that controls globally whether the test suite will align its operands or not. Then, update libblis_test_mobj_create()
so that it keys off of this parameter and then manually aligns the strides (using the SIMD alignment value), if needed, regardless of whether row, column, or general storage is being used, and then passes those strides into bli_obj_create()
.
The "new BLAS-like API" link in the README is empty and leads to a 404.
Should it instead point to https://github.com/flame/blis/wiki/BLISAPIQuickReference?
For gemm
ukernels which prefer contiguous rows, the temporary buffer for edge cases (when m != mr
or n != nr
) is treated as column-major (i.e. rs_c = 1
and cs_c = mr
). If the general-stride pathway in the ukernel is very slow then this might have a performance impact for small to medium size matrices.
The ukernel wrapper should treat the temporary buffer as row-major (rs_c = nr
and cs_c = 1
) in this case.
When I try to test for BLIS's support for dgemm_, I see the following link error, which does not seem to be covered by your build system wiki:
"""
/usr/bin/gcc-4.9 -DCHECK_FUNCTION_EXISTS=dgemm_ CMakeFiles/cmTryCompileExec3961259244.dir/CheckFunctionExists.c.o -o cmTryCompileExec3961259244 -rdynamic /home/poulson/Install/lib/libblis.a -lpthread -lm
/home/poulson/Install/lib/libblis.a(bli_init.o): In function bli_init': bli_init.c:(.text+0xb4): undefined reference to
GOMP_critical_name_start'
bli_init.c:(.text+0xc6): undefined reference to GOMP_critical_name_end' /home/poulson/Install/lib/libblis.a(bli_init.o): In function
bli_finalize':
bli_init.c:(.text+0x1d0): undefined reference to GOMP_critical_name_start' bli_init.c:(.text+0x1e2): undefined reference to
GOMP_critical_name_end'
/home/poulson/Install/lib/libblis.a(bli_mem.o): In function bli_mem_acquire_m': bli_mem.c:(.text+0x77): undefined reference to
GOMP_critical_name_start'
bli_mem.c:(.text+0x8f): undefined reference to GOMP_critical_name_end' /home/poulson/Install/lib/libblis.a(bli_mem.o): In function
bli_mem_release':
bli_mem.c:(.text+0xfa): undefined reference to GOMP_critical_name_start' bli_mem.c:(.text+0x112): undefined reference to
GOMP_critical_name_end'
/home/poulson/Install/lib/libblis.a(bli_threading_omp.o): In function bli_level3_thread_decorator._omp_fn.0': bli_threading_omp.c:(.text+0x5): undefined reference to
omp_get_thread_num'
/home/poulson/Install/lib/libblis.a(bli_threading_omp.o): In function bli_level3_thread_decorator': bli_threading_omp.c:(.text+0x99): undefined reference to
GOMP_parallel'
collect2: error: ld returned 1 exit status
"""
To clarify, I'm referring to the level-0 object-based APIs in frame/0
, not the level-0 scalar macros in frame/include/level0
.
It seems strange for configure
not to default to the auto
configuration. It took a bit of digging for me to find that this was supported. Why not attempt the auto-detection instead of failing?
Another one for you, following up on #9
When I try to build the Sandy Bridge configuration on Windows in MSYS2 with MinGW compiler (on an i7-2630QM), I get a failure to link the test executable:
Archiving lib/sandybridge/libblis.a
Linking test_libblis.x against './lib/sandybridge/libblis.a -lm'
./lib/sandybridge/libblis.a(bli_gemm_cntl.o):bli_gemm_cntl.c:(.text+0x1bc): undefined reference to `bli_dgemm_opt_8x4_ref_u4_nodupl_avx1'
./lib/sandybridge/libblis.a(bli_gemm_ukernel.o):bli_gemm_ukernel.c:(.text+0x11): undefined reference to `bli_dgemm_opt_8x4_ref_u4_nodupl_avx1'
./lib/sandybridge/libblis.a(bli_gemmtrsm_l_ukr_ref.o):bli_gemmtrsm_l_ukr_ref.c:(.text+0x10d): undefined reference to `bli_dgemm_opt_8x4_ref_u4_nodupl_avx1'
./lib/sandybridge/libblis.a(bli_gemmtrsm_u_ukr_ref.o):bli_gemmtrsm_u_ukr_ref.c:(.text+0x10d): undefined reference to `bli_dgemm_opt_8x4_ref_u4_nodupl_avx1'
./lib/sandybridge/libblis.a(bli_gemm4m_ukr_ref.o):bli_gemm4m_ukr_ref.c:(.text+0xe94): undefined reference to `bli_dgemm_opt_8x4_ref_u4_nodupl_avx1'
./lib/sandybridge/libblis.a(bli_gemm4m_ukr_ref.o):bli_gemm4m_ukr_ref.c:(.text+0xedb): more undefined references to `bli_dgemm_opt_8x4_ref_u4_nodupl_avx1' follow
d:/code/mingw-builds/x64-4.8.1-win32-seh-rev5/mingw64/bin/../lib/gcc/x86_64-w64-mingw32/4.8.1/../../../../x86_64-w64-mingw32/bin/ld.exe: ./lib/sandybridge/libblis.a(bli_gemm4m_ukr_ref.o): bad reloc address 0x0 in section `.pdata'
collect2.exe: error: ld returned 1 exit status
Makefile:531: recipe for target 'test_libblis.x' failed
make: *** [test_libblis.x] Error 1
If I try in Cygwin, setting CC := x86_64-w64-mingw32-gcc
and AR := x86_64-w64-mingw32-ar
in config/sandybridge/make_defs.mk
to use the MinGW cross-compiler, then the executable links correctly but segfaults when running the tests. The backtrace is more interesting if I set BLIS_SIMD_ALIGN_SIZE
to 1 in config/sandybridge/config.h
, since my patch in #9 didn't completely fix the alignment problems. Backtrace with alignment=1 (also uncommented CDBGFLAGS := -g
to get debug info) posted here https://gist.github.com/tkelman/25d290b131c0a1205b27. Everything passes until blis_dgemm_nn_ccc
. The same bli_dgemm_opt_8x4_ref_u4_nodupl_avx1
that was an undefined reference in MSYS2 is causing the segfault in Cygwin-to-MinGW cross-compile.
There is a long history of proposals to include simple modifications of ?syrk
and ?herk
to support C += alpha A (D A)^T
, where D is diagonal. This kernel turns out to be important for Interior Point Methods, which often make use of LDL factorizations (without pivoting) of modified saddle-point systems.
Does such a routine already exist in BLIS? If not, is there a good place to start for adding support? Or a preferred name/convention?
Having a CMake build system available makes it a lot easier to build with a variety of compilers in a variety of environments. In particular, it makes it a lot easier to build things on Windows.
Is there interest in this?
I'm considering implementing this myself, though it'll probably take a while since I have several side projects going at the moment.
As testsuite always exits with code 0 (success), it is hard to use it as a unit test. The test suite should return non-zero if any of the tests failed.
Commands:
./configure -t pthreads sandybridge
make
First error is:
In file included from ./frame/base/bli_threading.h:88:0,
from ./frame/include/blis.h:73,
from config/sandybridge/kernels/3/bli_gemm_asm_d8x4.c:38:
./frame/base/bli_threading_pthreads.h:43:13: error: conflicting types for ‘pthread_barrierattr_t’
typedef int pthread_barrierattr_t;
^
In file included from /usr/include/pthread.h:26:0,
from ./frame/base/bli_threading_pthreads.h:40,
from ./frame/base/bli_threading.h:88,
from ./frame/include/blis.h:73,
from config/sandybridge/kernels/3/bli_gemm_asm_d8x4.c:38:
/usr/include/x86_64-linux-gnu/bits/pthreadtypes.h:249:3: note: previous declaration of ‘pthread_barrierattr_t’ was here
} pthread_barrierattr_t;
^
versions:
Hi,
I compiled BLIS 0.1.8 on our BG/Q system without any modification, but I got the following failures when I
run the test. The complete test log is given here: https://goo.gl/vGXz6S . (libblis.a binary: https://goo.gl/8lQ999 )
Note: The version of our software stack is V1R2M0, job a submitted interactively through SLURM scheduler.
e.g.
...
blis_cgemm4mh_ct_ccc 200 200 200 0.582 1.28e-04 FAILURE
blis_cgemm4mh_ch_ccc 100 100 100 0.515 3.80e-04 FAILURE
blis_cgemm4mh_ch_ccc 200 200 200 0.582 1.30e-04 FAILURE
blis_cgemm4mh_tn_ccc 100 100 100 0.528 3.60e-04 FAILURE
...
blis_zgemm4mh_hc_ccc 200 200 200 2.446 1.40e-01 FAILURE
blis_zgemm4mh_hc_ccc 300 300 300 2.798 1.51e-01 FAILURE
blis_zgemm4mh_hc_ccc 400 400 400 3.141 1.72e-01 FAILURE
blis_zgemm4mh_ht_ccc 100 100 100 1.449 1.43e-01 FAILURE
blis_zgemm4mh_ht_ccc 200 200 200 2.308 1.56e-01 FAILURE
blis_zgemm4mh_ht_ccc 300 300 300 2.682 1.63e-01 FAILURE
...
blis_zsymm4mh_ruch_ccc 100 100 1.449 3.44e-01 FAILURE
blis_zsymm4mh_ruch_ccc 200 200 2.332 3.72e-01 FAILURE
blis_zsymm4mh_ruch_ccc 300 300 2.718 3.99e-01 FAILURE
blis_zsymm4mh_ruch_ccc 400 400 3.051 3.70e-01 FAILURE
blis_csyrk4mh_ln_cc 100 100 0.421 3.72e-04 FAILURE
blis_csyrk4mh_ln_cc 200 200 0.528 1.26e-04 FAILURE
blis_csyrk4mh_lc_cc 100 100 0.421 3.38e-04 FAILURE
blis_csyrk4mh_lc_cc 200 200 0.527 1.25e-04 FAILURE
blis_csyrk4mh_lt_cc 100 100 0.439 3.34e-04 FAILURE
...
Please advice.
Thanks!
Rgds,
Dominic Chien
For example, if one does configure knc
instead of configure mic
, the results is:
[dam879@stampede knc]$~/src/blis/configure -p `pwd`/install knc
configure: checking whether we need to update the version file.
configure: checking version file '/home1/02742/dam879/src/blis/version'.
configure: starting configuration of BLIS 0.2.0.
configure: manual configuration requested.
configure: configuring with 'knc' configuration sub-directory.
configure: using install prefix '/home1/02742/dam879/build/blis/knc/install'.
configure: debug symbols disabled.
configure: disabling verbose make output, enable with 'make V=1'.
configure: building BLIS as a static library.
configure: threading is disabled.
configure: the CBLAS compatibility layer is disabled.
configure: the BLAS compatibility layer is enabled.
configure: the internal integer size is automatically determined.
configure: the BLAS/CBLAS interface integer size is 32-bit.
configure: creating ./config.mk from /home1/02742/dam879/src/blis/build/config.mk.in
configure: creating ./bli_config.h from /home1/02742/dam879/src/blis/build/bli_config.h.in
configure: creating ./obj/knc
configure: creating ./obj/knc/config
configure: creating ./obj/knc/frame
configure: creating ./obj/knc/testsuite
configure: creating ./lib/knc
configure: mirroring /home1/02742/dam879/src/blis/config/knc to ./obj/knc/config
ls: cannot access /home1/02742/dam879/src/blis/config/knc: No such file or directory
configure: mirroring /home1/02742/dam879/src/blis/frame to ./obj/knc/frame
configure: creating makefile fragment in /home1/02742/dam879/src/blis/config/knc
ls: cannot access /home1/02742/dam879/src/blis/config/knc: No such file or directory
ls: cannot access /home1/02742/dam879/src/blis/config/knc: No such file or directory
/home1/02742/dam879/src/blis/build/gen-make-frags/gen-make-frag.sh: line 230: /home1/02742/dam879/src/blis/config/knc/.fragment.mk: No such file or directory
ls: cannot access /home1/02742/dam879/src/blis/config/knc: No such file or directory
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/0
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/0/copysc
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1/kernels
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1/packv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1/scalv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1/unpackv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1d
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1f
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1f/kernels
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1m
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1m/packm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1m/packm/ukernels
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1m/scalm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1m/unpackm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/1m/unpackm/ukernels
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/gemv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/ger
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/hemv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/her
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/her2
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/symv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/syr
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/syr2
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/trmv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/2/trsv
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/gemm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/gemm/ind
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/hemm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/her2k
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/herk
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/symm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/syr2k
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/syrk
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/trmm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/trmm3
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/trsm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/3/ukernels
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/base
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/base/check
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/base/noopt
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/cntl
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/compat
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/compat/cblas
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/compat/cblas/f77_sub
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/compat/cblas/src
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/compat/check
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/compat/f2c
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/compat/f2c/util
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include/level0
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include/level0/io
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include/level0/ri
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include/level0/ri3
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include/level0/rih
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include/level0/ro
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/include/level0/rpi
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind/cntx
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind/include
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind/oapi
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind/tapi
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind/ukernels
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind/ukernels/gemm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/ind/ukernels/trsm
configure: creating makefile fragment in /home1/02742/dam879/src/blis/frame/util
configure: creating symbolic link to Makefile.
configure: creating symbolic link to common.mk.
configure: configured to build outside of source distribution.
Without looking carefully this seems to indicate a successful configuration. Failure in this case should be quick and obvious.
The BLAS interface does not appear to work for dtrsm when one of the input matrices has a zero dimension. It appears to be the result of bli_trsm calling bli_obj_create_with_attached_buffer on the input matrices, which leads to a check that improperly aborts if the corresponding buffer is null, even if the matrix dimensions were zero. I would assume that this bug affects a large number of routines.
packm implicitly assumes that the register blocksizes are both non-unit. While this is not a bad assumption in practice, it would be nice to lift this constraint so that the right thing happens even if MR or NR (or both) happen to be 1. The problem boils down to the definition of the bli_is_row_stored_f()
and bli_is_col_stored_f()
macros, which only look at the row and column strides [of the packed micro-panel]. Naturally, if both are unit, then a "row-stored" mx1 micro-panel is indistinguishable from a "column-stored" 1xn micro-panel.
It seems that the linker on OSX is not able to find any of the constants defined in bli_const.c
, such as BLIS_ONE etc. The problem, according to this page is that undefined constants end up as "common" symbols which are ignored by the OSX linker. The two available solutions seem to be:
obj_t
's, default-initialization with ...={}
should be OK (and they are initialized for real later).-fno-common
. This would have to be added to each configuration or in common.mk
.I am not sure when this problem first appeared, since I thought I had successfully compiled after the "big commit", but I am seeing it in my branch based off of cbcd0b7. Using gcc-5.3.0 from Homebrew instead of the system "gcc" may also be a contributing factor.
Compiling frame/base/bli_threading_omp.c
frame/base/bli_threading_omp.c: In function 'bli_barrier':
frame/base/bli_threading_omp.c:88: error: expected end of line before 'capture'
frame/base/bli_threading_omp.c:89: error: invalid operator for '#pragma omp atomic' before '=' token
make: *** [obj/sandybridge/frame/base/bli_threading_omp.o] Error 1
I know that earlier Intel compilers did not support the full OMP 3.1 standard, but this is Intel 15.
When compiling with pthreads now, I get a bunch of warnings and errors like the below. If I back up to the commit ce59f81, these go away.
In file included from ./frame/thread/bli_mutex.h:43:0,
from ./frame/thread/bli_thread.h:56,
from ./frame/include/blis.h:74,
from frame/base/bli_membrk.c:36:
frame/base/bli_membrk.c: In function ‘bli_membrk_init’:
./frame/base/bli_membrk.h:56:2: warning: passing argument 1 of ‘pthread_mutex_init’ from incompatible pointer type
( &( (membrk_p)->mutex ) )
^
./frame/thread/bli_mutex_pthreads.h:53:22: note: in definition of macro ‘bli_mutex_init’
pthread_mutex_init( mtx_p );
^
frame/base/bli_membrk.c:44:18: note: in expansion of macro ‘bli_membrk_mutex’
bli_mutex_init( bli_membrk_mutex( membrk ) );
^
In file included from ./frame/thread/bli_mutex_pthreads.h:42:0,
from ./frame/thread/bli_mutex.h:43,
from ./frame/thread/bli_thread.h:56,
from ./frame/include/blis.h:74,
from frame/base/bli_membrk.c:36:
/usr/include/pthread.h:723:12: note: expected ‘union pthread_mutex_t *’ but argument is of type ‘struct mtx_s *’
extern int pthread_mutex_init (pthread_mutex_t *__mutex,
^
In file included from ./frame/thread/bli_mutex.h:43:0,
from ./frame/thread/bli_thread.h:56,
from ./frame/include/blis.h:74,
from frame/base/bli_membrk.c:36:
./frame/thread/bli_mutex_pthreads.h:53:2: error: too few arguments to function ‘pthread_mutex_init’
pthread_mutex_init( mtx_p );
^
frame/base/bli_membrk.c:44:2: note: in expansion of macro ‘bli_mutex_init’
bli_mutex_init( bli_membrk_mutex( membrk ) );
^
In file included from ./frame/thread/bli_mutex_pthreads.h:42:0,
from ./frame/thread/bli_mutex.h:43,
from ./frame/thread/bli_thread.h:56,
from ./frame/include/blis.h:74,
from frame/base/bli_membrk.c:36:
/usr/include/pthread.h:723:12: note: declared here
extern int pthread_mutex_init (pthread_mutex_t *__mutex,
^
For GNU gcc, this takes the form of the -pg
option. Should be disabled by default.
Hello,
It seems libblis is not thread safe, I have a gemm invocation
static inline void matmul(cv::Mat &c, cv::Mat &a, cv::Mat b)
{
float alphap = 1.0;
//since beta is zero, we don't need to init c to zero
float betap = 0.0;
cntx_t cntx;
bli_gemm_cntx_init(&cntx);
bli_sgemm(BLIS_NO_TRANSPOSE, BLIS_NO_TRANSPOSE, a.rows, b.cols, a.cols,
&alphap,
(float *)a.data, a.cols, 1,
(float *)b.data, b.cols, 1,
&betap,
(float *)c.data, b.cols, 1, &cntx);
bli_gemm_cntx_finalize(&cntx);
}
and we have several thread will invoke the matmul, then it will crash as follow, the parameter p get changed to 0, if I change the program to run only one thread to invoke the matmul, it will be all right.
#0 0x0000000000545abc in bli_spackm_6xk_ref (conja=BLIS_NO_CONJUGATE, n=25, kappa=0x7fffec000dc0, a=0x7fffb8091540, inca=25, lda=1, p=0x0, ldp=6)
at frame/1m/packm/ukernels/bli_packm_cxk_ref.c:414
#1 0x00000000004ee461 in bli_spackm_cxk (conja=BLIS_NO_CONJUGATE, panel_dim=6, panel_len=25, kappa=0x7fffec000dc0, a=0x7fffb8091540, inca=25, lda=1, p=0x0, ldp=6,
cntx=0x7fffd247c190) at frame/1m/packm/bli_packm_cxk.c:216
#2 0x00000000004c4806 in bli_spackm_struc_cxk (strucc=BLIS_GENERAL, diagoffc=0, diagc=BLIS_NONUNIT_DIAG, uploc=BLIS_DENSE, conjc=BLIS_NO_CONJUGATE,
schema=BLIS_PACKED_COL_PANELS, invdiag=0, m_panel=25, n_panel=6, m_panel_max=25, n_panel_max=6, kappa=0x7fffec000dc0, c=0x7fffb8091540, rs_c=1, cs_c=25, p=0x0,
rs_p=6, cs_p=1, is_p=1, cntx=0x7fffd247c190) at frame/1m/packm/bli_packm_struc_cxk.c:255
#3 0x00000000004bbbe6 in bli_spackm_blk_var1 (strucc=BLIS_GENERAL, diagoffc=0, diagc=BLIS_NONUNIT_DIAG, uploc=BLIS_DENSE, transc=BLIS_NO_TRANSPOSE,
schema=BLIS_PACKED_COL_PANELS, invdiag=0, revifup=0, reviflo=0, m=25, n=1936, m_max=25, n_max=1938, kappa=0x7fffec000dc0, c=0x7fffb8091540, rs_c=1, cs_c=25, p=0x0,
rs_p=6, cs_p=1, is_p=1, pd_p=6, ps_p=150, packm_ker=0x4c46f9 <bli_spackm_struc_cxk>, cntx=0x7fffd247c190, thread=0x7fffb8007600)
at frame/1m/packm/bli_packm_blk_var1.c:668
#4 0x00000000004bb133 in bli_packm_blk_var1 (c=0x7fffd2479e50, p=0x7fffd24798e0, cntx=0x7fffd247c190, t=0x7fffb8007600) at frame/1m/packm/bli_packm_blk_var1.c:234
#5 0x00000000004aed11 in bli_packm_int (a=0x7fffd2479e50, p=0x7fffd24798e0, cntx=0x7fffd247c190, cntl=0x7fffec002c00, thread=0x7fffb8007600)
at frame/1m/packm/bli_packm_int.c:125
#6 0x00000000004b23ca in bli_gemm_blk_var1f (a=0x7fffd2479d80, b=0x7fffd2479e50, c=0x7fffd2479f20, cntx=0x7fffd247c190, cntl=0x7fffec002d00, thread=0x7fffb8007660)
at frame/3/gemm/bli_gemm_blk_var1f.c:79
#7 0x00000000004488b2 in bli_gemm_int (alpha=0x7c66a0 <BLIS_ONE>, a=0x7fffd247a160, b=0x7fffd247a230, beta=0x7c66a0 <BLIS_ONE>, c=0x7fffd247a090, cntx=0x7fffd247c190,
cntl=0x7fffec002d00, thread=0x7fffb8007660) at frame/3/gemm/bli_gemm_int.c:154
#8 0x00000000004b304b in bli_gemm_blk_var3f (a=0x7fffd247a530, b=0x7fffd247a600, c=0x7fffd247a6d0, cntx=0x7fffd247c190, cntl=0x7fffec002da0, thread=0x7fffb80c0ae0)
at frame/3/gemm/bli_gemm_blk_var3f.c:121
#9 0x00000000004488b2 in bli_gemm_int (alpha=0x7c66a0 <BLIS_ONE>, a=0x7fffd247a840, b=0x7fffd247a910, beta=0x7c66a0 <BLIS_ONE>, c=0x7fffd247a9e0, cntx=0x7fffd247c190,
cntl=0x7fffec002da0, thread=0x7fffb80c0ae0) at frame/3/gemm/bli_gemm_int.c:154
#10 0x00000000004b2b28 in bli_gemm_blk_var2f (a=0x7fffd247ace0, b=0x7fffd247adb0, c=0x7fffd247ae80, cntx=0x7fffd247c190, cntl=0x7fffec002e40, thread=0x7fffb80c0ca0)
at frame/3/gemm/bli_gemm_blk_var2f.c:123
#11 0x00000000004488b2 in bli_gemm_int (alpha=0x7fffd247bcf0, a=0x7fffd247b0c0, b=0x7fffd247b190, beta=0x7fffd247bf60, c=0x7fffd247b260, cntx=0x7fffd247c190,
cntl=0x7fffec002e40, thread=0x7fffb80c0ca0) at frame/3/gemm/bli_gemm_int.c:154
#12 0x0000000000423d60 in bli_level3_thread_decorator (n_threads=1, func=0x447f75 <bli_gemm_int>, alpha=0x7fffd247bcf0, a=0x7fffd247b0c0, b=0x7fffd247b190,
beta=0x7fffd247bf60, c=0x7fffd247b260, cntx=0x7fffd247c190, cntl=0x7fffec002e40, thread=0x7fffb80070c0) at frame/base/bli_threading.c:92
#13 0x0000000000447f5a in bli_gemm_front (alpha=0x7fffd247bcf0, a=0x7fffd247bdc0, b=0x7fffd247be90, beta=0x7fffd247bf60, c=0x7fffd247c030, cntx=0x7fffd247c190,
cntl=0x7fffec002e40) at frame/3/gemm/bli_gemm_front.c:86
#14 0x0000000000429be5 in bli_gemmnat (alpha=0x7fffd247bcf0, a=0x7fffd247bdc0, b=0x7fffd247be90, beta=0x7fffd247bf60, c=0x7fffd247c030, cntx=0x7fffd247c190)
at frame/ind/oapi/bli_l3_nat_oapi.c:80
#15 0x000000000049242b in bli_gemmind (alpha=0x7fffd247bcf0, a=0x7fffd247bdc0, b=0x7fffd247be90, beta=0x7fffd247bf60, c=0x7fffd247c030, cntx=0x7fffd247c190)
at frame/ind/oapi/bli_l3_ind_oapi.c:59
#16 0x000000000044701e in bli_gemm_ex (alpha=0x7fffd247bcf0, a=0x7fffd247bdc0, b=0x7fffd247be90, beta=0x7fffd247bf60, c=0x7fffd247c030, cntx=0x7fffd247c190)
at frame/3/bli_l3_oapi.c:74
#17 0x0000000000419e3c in bli_sgemm (transa=BLIS_NO_TRANSPOSE, transb=BLIS_NO_TRANSPOSE, m=1936, n=16, k=25, alpha=0x7fffd247c188, a=0x7fffb8091540, rs_a=25, cs_a=1,
b=0x7fffb8003790, rs_b=16, cs_b=1, beta=0x7fffd247c18c, c=0x7fffb8040e30, rs_c=16, cs_c=1, cntx=0x7fffd247c190) at frame/3/bli_l3_tapi.c:93
being not a programmer I'd like to ask - what might the problem?
Type 'q()' to quit R.
PR#4582 %*% with NAs
stopifnot(is.na(NA %% 0), is.na(0 %% NA))
depended on the BLAS in use.
found from fallback test in slam 0.1-15
most likely indicates an inaedquate BLAS.
x <- matrix(c(1, 0, NA, 1), 2, 2)
y <- matrix(c(1, 0, 0, 2, 1, 0), 3, 2)
(z <- tcrossprod(x, y))
[,1] [,2] [,3]
[1,] NA NA 0
[2,] 2 1 0
stopifnot(identical(z, x %% t(y)))
Error: identical(z, x %% t(y)) is not TRUE
Because of the fact that Apple chose to alias gcc
to clang
and g++
to clang++
, the current Sandybridge configuration,
blis/config/sandybridge/make_defs.mk
Line 79 in ec25807
The following operations fail with the dunnington and sandybridge configurations (and probably haswell too, but not tested) using icc 16.0.1 and compiling with CVECOPTS = '-xSSE4.2' and '-xAVX' respectively:
cgemm4mh
chemm4mh
csymm4mh
csyrk4mh
csyr2k4mh
ctrmm34mh
XLC is the only compiler I've ever seen that gives these warnings, but I think they are formally legit. On the other hand, they may be false positives in practice.
[jhammond@vestalac1 blis]$ ./configure -p $HOME/BLIS/bgq bgq && make -j32 && make install
configure: checking whether we need to update the version file.
configure: checking version file './version'.
configure: found '.git' directory; assuming git clone.
configure: executing git describe --tags.
configure: got back 0.1.1-14-gd531a24.
configure: truncating to 0.1.1-14.
configure: updating version file './version'.
configure: starting configuration of BLIS 0.1.1-14.
configure: configuring with 'bgq' configuration sub-directory.
configure: detected -p option; using install prefix '/home/jhammond/BLIS/bgq'.
configure: creating ./config.mk from ./build/config.mk.in
configure: creating ./obj/bgq
configure: creating ./obj/bgq/config
configure: creating ./obj/bgq/frame
configure: creating ./obj/bgq/testsuite
configure: creating ./lib/bgq
configure: mirroring ./config/bgq to ./obj/bgq/config
configure: mirroring ./frame to ./obj/bgq/frame
configure: creating makefile fragment in ./config/bgq
configure: creating makefile fragment in ./config/bgq/kernels
configure: creating makefile fragment in ./config/bgq/kernels/1
configure: creating makefile fragment in ./config/bgq/kernels/1f
configure: creating makefile fragment in ./config/bgq/kernels/3
configure: creating makefile fragment in ./frame
configure: creating makefile fragment in ./frame/0
configure: creating makefile fragment in ./frame/0/absqsc
configure: creating makefile fragment in ./frame/0/addsc
configure: creating makefile fragment in ./frame/0/copysc
configure: creating makefile fragment in ./frame/0/divsc
configure: creating makefile fragment in ./frame/0/getsc
configure: creating makefile fragment in ./frame/0/mulsc
configure: creating makefile fragment in ./frame/0/normfsc
configure: creating makefile fragment in ./frame/0/setsc
configure: creating makefile fragment in ./frame/0/sqrtsc
configure: creating makefile fragment in ./frame/0/subsc
configure: creating makefile fragment in ./frame/0/unzipsc
configure: creating makefile fragment in ./frame/0/zipsc
configure: creating makefile fragment in ./frame/1
configure: creating makefile fragment in ./frame/1/addv
configure: creating makefile fragment in ./frame/1/axpyv
configure: creating makefile fragment in ./frame/1/copyv
configure: creating makefile fragment in ./frame/1/dotv
configure: creating makefile fragment in ./frame/1/dotxv
configure: creating makefile fragment in ./frame/1/invertv
configure: creating makefile fragment in ./frame/1/packv
configure: creating makefile fragment in ./frame/1/scal2v
configure: creating makefile fragment in ./frame/1/scalv
configure: creating makefile fragment in ./frame/1/setv
configure: creating makefile fragment in ./frame/1/subv
configure: creating makefile fragment in ./frame/1/swapv
configure: creating makefile fragment in ./frame/1/unpackv
configure: creating makefile fragment in ./frame/1d
configure: creating makefile fragment in ./frame/1d/addd
configure: creating makefile fragment in ./frame/1d/axpyd
configure: creating makefile fragment in ./frame/1d/copyd
configure: creating makefile fragment in ./frame/1d/invertd
configure: creating makefile fragment in ./frame/1d/scal2d
configure: creating makefile fragment in ./frame/1d/scald
configure: creating makefile fragment in ./frame/1d/setd
configure: creating makefile fragment in ./frame/1d/subd
configure: creating makefile fragment in ./frame/1f
configure: creating makefile fragment in ./frame/1f/axpy2v
configure: creating makefile fragment in ./frame/1f/axpyf
configure: creating makefile fragment in ./frame/1f/dotaxpyv
configure: creating makefile fragment in ./frame/1f/dotxaxpyf
configure: creating makefile fragment in ./frame/1f/dotxf
configure: creating makefile fragment in ./frame/1m
configure: creating makefile fragment in ./frame/1m/addm
configure: creating makefile fragment in ./frame/1m/axpym
configure: creating makefile fragment in ./frame/1m/copym
configure: creating makefile fragment in ./frame/1m/packm
configure: creating makefile fragment in ./frame/1m/packm/ukernels
configure: creating makefile fragment in ./frame/1m/scal2m
configure: creating makefile fragment in ./frame/1m/scalm
configure: creating makefile fragment in ./frame/1m/setm
configure: creating makefile fragment in ./frame/1m/subm
configure: creating makefile fragment in ./frame/1m/unpackm
configure: creating makefile fragment in ./frame/1m/unpackm/ukernels
configure: creating makefile fragment in ./frame/2
configure: creating makefile fragment in ./frame/2/gemv
configure: creating makefile fragment in ./frame/2/ger
configure: creating makefile fragment in ./frame/2/hemv
configure: creating makefile fragment in ./frame/2/her
configure: creating makefile fragment in ./frame/2/her2
configure: creating makefile fragment in ./frame/2/symv
configure: creating makefile fragment in ./frame/2/syr
configure: creating makefile fragment in ./frame/2/syr2
configure: creating makefile fragment in ./frame/2/trmv
configure: creating makefile fragment in ./frame/2/trsv
configure: creating makefile fragment in ./frame/3
configure: creating makefile fragment in ./frame/3/gemm
configure: creating makefile fragment in ./frame/3/gemm/3m
configure: creating makefile fragment in ./frame/3/gemm/3m/ukernels
configure: creating makefile fragment in ./frame/3/gemm/4m
configure: creating makefile fragment in ./frame/3/gemm/4m/ukernels
configure: creating makefile fragment in ./frame/3/gemm/ukernels
configure: creating makefile fragment in ./frame/3/hemm
configure: creating makefile fragment in ./frame/3/hemm/3m
configure: creating makefile fragment in ./frame/3/hemm/4m
configure: creating makefile fragment in ./frame/3/her2k
configure: creating makefile fragment in ./frame/3/her2k/3m
configure: creating makefile fragment in ./frame/3/her2k/4m
configure: creating makefile fragment in ./frame/3/herk
configure: creating makefile fragment in ./frame/3/herk/3m
configure: creating makefile fragment in ./frame/3/herk/4m
configure: creating makefile fragment in ./frame/3/symm
configure: creating makefile fragment in ./frame/3/symm/3m
configure: creating makefile fragment in ./frame/3/symm/4m
configure: creating makefile fragment in ./frame/3/syr2k
configure: creating makefile fragment in ./frame/3/syr2k/3m
configure: creating makefile fragment in ./frame/3/syr2k/4m
configure: creating makefile fragment in ./frame/3/syrk
configure: creating makefile fragment in ./frame/3/syrk/3m
configure: creating makefile fragment in ./frame/3/syrk/4m
configure: creating makefile fragment in ./frame/3/trmm
configure: creating makefile fragment in ./frame/3/trmm/3m
configure: creating makefile fragment in ./frame/3/trmm/4m
configure: creating makefile fragment in ./frame/3/trmm3
configure: creating makefile fragment in ./frame/3/trmm3/3m
configure: creating makefile fragment in ./frame/3/trmm3/4m
configure: creating makefile fragment in ./frame/3/trsm
configure: creating makefile fragment in ./frame/3/trsm/3m
configure: creating makefile fragment in ./frame/3/trsm/3m/ukernels
configure: creating makefile fragment in ./frame/3/trsm/4m
configure: creating makefile fragment in ./frame/3/trsm/4m/ukernels
configure: creating makefile fragment in ./frame/3/trsm/ukernels
configure: creating makefile fragment in ./frame/base
configure: creating makefile fragment in ./frame/base/check
configure: creating makefile fragment in ./frame/base/noopt
configure: creating makefile fragment in ./frame/cntl
configure: creating makefile fragment in ./frame/compat
configure: creating makefile fragment in ./frame/compat/check
configure: creating makefile fragment in ./frame/compat/f2c
configure: creating makefile fragment in ./frame/compat/f2c/util
configure: creating makefile fragment in ./frame/include
configure: creating makefile fragment in ./frame/include/level0
configure: creating makefile fragment in ./frame/include/level0/ri
configure: creating makefile fragment in ./frame/include/level0/ri3
configure: creating makefile fragment in ./frame/util
configure: creating makefile fragment in ./frame/util/amaxv
configure: creating makefile fragment in ./frame/util/asumv
configure: creating makefile fragment in ./frame/util/mkherm
configure: creating makefile fragment in ./frame/util/mksymm
configure: creating makefile fragment in ./frame/util/mktrim
configure: creating makefile fragment in ./frame/util/norm1m
configure: creating makefile fragment in ./frame/util/norm1v
configure: creating makefile fragment in ./frame/util/normfm
configure: creating makefile fragment in ./frame/util/normfv
configure: creating makefile fragment in ./frame/util/normim
configure: creating makefile fragment in ./frame/util/normiv
configure: creating makefile fragment in ./frame/util/printm
configure: creating makefile fragment in ./frame/util/printv
configure: creating makefile fragment in ./frame/util/randm
configure: creating makefile fragment in ./frame/util/randv
configure: creating makefile fragment in ./frame/util/sumsqv
configure: configured to build within top-level directory of source distribution.
Compiling config/bgq/kernels/1/bli_axpyv_opt_var1.c (NOTE: using flags for kernels)
Compiling frame/0/unzipsc/bli_unzipsc.c
Compiling frame/0/unzipsc/bli_unzipsc_check.c
Compiling frame/0/unzipsc/bli_unzipsc_unb_var1.c
Compiling frame/0/zipsc/bli_zipsc.c
Compiling frame/0/zipsc/bli_zipsc_check.c
Compiling frame/0/zipsc/bli_zipsc_unb_var1.c
Compiling frame/1/addv/bli_addv.c
Compiling frame/1/addv/bli_addv_check.c
Compiling frame/1/addv/bli_addv_kernel.c
Compiling frame/1/addv/bli_addv_ref.c
Compiling frame/1/axpyv/bli_axpyv.c
Compiling frame/1/axpyv/bli_axpyv_check.c
Compiling frame/1/axpyv/bli_axpyv_kernel.c
Compiling frame/1/axpyv/bli_axpyv_ref.c
Compiling frame/1/copyv/bli_copyv.c
Compiling frame/1/copyv/bli_copyv_check.c
Compiling frame/1/copyv/bli_copyv_kernel.c
Compiling frame/1/copyv/bli_copyv_ref.c
Compiling frame/1/dotv/bli_dotv.c
Compiling frame/1/dotv/bli_dotv_check.c
Compiling frame/1/dotv/bli_dotv_kernel.c
Compiling frame/1/dotv/bli_dotv_ref.c
Compiling frame/1/dotxv/bli_dotxv.c
Compiling frame/1/dotxv/bli_dotxv_check.c
Compiling frame/1/dotxv/bli_dotxv_kernel.c
Compiling frame/1/dotxv/bli_dotxv_ref.c
Compiling frame/1/invertv/bli_invertv.c
Compiling frame/1/invertv/bli_invertv_check.c
Compiling frame/1/invertv/bli_invertv_kernel.c
Compiling frame/1/invertv/bli_invertv_ref.c
Compiling frame/1/packv/bli_packv.c
Compiling frame/1/packv/bli_packv_check.c
Compiling frame/1/packv/bli_packv_cntl.c
Compiling frame/1/packv/bli_packv_init.c
"config/bgq/kernels/1/bli_axpyv_opt_var1.c", line 45.33: 1506-1418 (E) Assignment between restrict pointers "alpha" and "alpha_in" is not allowed. Only outer-to-inner scope assignments between restrict pointers are allowed.
"config/bgq/kernels/1/bli_axpyv_opt_var1.c", line 46.29: 1506-1418 (E) Assignment between restrict pointers "x" and "x_in" is not allowed. Only outer-to-inner scope assignments between restrict pointers are allowed.
"config/bgq/kernels/1/bli_axpyv_opt_var1.c", line 47.29: 1506-1418 (E) Assignment between restrict pointers "y" and "y_in" is not allowed. Only outer-to-inner scope assignments between restrict pointers are allowed.
"config/bgq/kernels/1/bli_axpyv_opt_var1.c", line 68.28: 1506-754 (S) The parameter type is not valid for a function of this linkage type.
make: *** [obj/bgq/config/kernels/1/bli_axpyv_opt_var1.o] Error 1
make: *** Waiting for unfinished jobs....
Hi, I would like to try blis. Is there a guide on how to cross compile on linux so that I can use blis on windows?
The sandybridge configuration does not compile with -dnoopt, it requires avx.
The following would fix it:
COPTFLAGS := -O0 -march=native
Sample error, using gcc (gcc version 5.3.1 20160316 (Debian 5.3.1-12))
config/sandybridge/kernels/3/bli_gemm_int_d8x4.c: In function ‘bli_dgemm_int_8x4’:
config/sandybridge/kernels/3/bli_gemm_int_d8x4.c:111:10: warning: AVX vector return without AVX enabled changes the ABI [-Wpsabi]
va0_3b0 = _mm256_setzero_pd();
^
In file included from /usr/lib/gcc/x86_64-linux-gnu/5/include/immintrin.h:41:0,
from config/sandybridge/kernels/3/bli_gemm_int_d8x4.c:36:
/usr/lib/gcc/x86_64-linux-gnu/5/include/avxintrin.h:834:1: error: inlining failed in call to always_inline ‘_mm256_load_pd’: target specific option mismatch
_mm256_load_pd (double const *__P)
^
The license/copyright headers at the top of each source file need to be updated. The copyright year needs to be changed to "2016". (I never got around to updating it in 2015 and so it still reads "2014".)
Unfortunately, this change is going to touch virtually every file in the repository. If you have any objections, or this will disrupt your work, please speak up.
I am running libflame benchmark test routine (test_libflame.x in test folder). the test gets aborted because FLA_Hemv_check() called from FLA_Hemv_external() reports "Detecting unequal object datatypes". Prior to this call the data is getting corrupted and culprit could be Trsm from the blis library.
Note: With older version of blis, the test suite works just fine.
Test Parameters are: single precision, row-major format, FLA_Chol_solve() corrupts the data due to the call to Trsm_external.
The combination of thread pools + fork is unfortunate: it tends to lead to random freezes. Unfortunately, the best way to achieve high-level parallelism in Python is to use fork (via the multiprocessing module).
Fundamentally the problem is that if you spawn a thread pool, and then fork, then the child ends up thinking that it has a thread pool, and dispatching work to it, but there aren't actually any threads running. This doesn't end well.
When using OMP for threading, dealing with this is the responsibility of the OMP implementation, and out-of-scope for BLIS itself. But when using pthreads, this should be handled by using pthread_atfork
to register a pre-fork callback that shuts down the thread pool.
The equivalent issue in OpenBLAS was fixed in OpenMathLib/OpenBLAS#343, specifically with this code (which should probably actually be called openblas_install_fork_handler
...):
+void openblas_fork_handler()
+{
+ // This handler shuts down the OpenBLAS-managed PTHREAD pool when OpenBLAS is
+ // built with "make USE_OPENMP=0".
+ // Hanging can still happen when OpenBLAS is built against the libgomp
+ // implementation of OpenMP. The problem is tracked at:
+ // http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60035
+ // In the mean time build with USE_OPENMP=0 or link against another
+ // implementation of OpenMP.
+#ifndef OS_WINDOWS
+ int err;
+ err = pthread_atfork (BLASFUNC(blas_thread_shutdown), NULL, NULL);
+ if(err != 0)
+ openblas_warning(0, "OpenBLAS Warning ... cannot install fork handler. You may meet hang after fork.\n");
+#endif
+}
The attached test case can also probably be re-used with trivial tweaks to use the BLIS API instead of cblas: https://github.com/ogrisel/OpenBLAS/blob/49bd98f410369c9604031296f8ff47c5c20052bb/utest/test_fork.c
CPU: Core2 Duo E8400 (old machine)
OS: Ubuntu 14.04, x86-64
Compiled BLIS reference configuration, setting BLIS_ENABLE_DYNAMIC_BUILD := yes
. By itself, BLIS passes its own make test
.
I'm calling into BLIS from Julia by the following steps:
git clone https://github.com/julialang/julia
cd julia
mkdir -p $PWD/usr/lib
cp /path/to/libblis.so $PWD/usr/lib
echo 'override USE_SYSTEM_BLAS = 1' > Make.user
echo 'override USE_BLAS64 = 0' >> Make.user
echo 'override LIBBLAS = -L$(build_libdir) -lblis -lm' >> Make.user
echo 'override LIBBLASNAME = libblis' >> Make.user
make -j8 # this will take quite a while - Julia has lots of big dependencies
make testall
This gives me a different failure each time I repeat make testall
. Here are some examples, from the first couple of files in Julia's test suite (linalg1 and linalg2):
https://gist.github.com/47dbd5517c4a6f56fb2e
https://gist.github.com/51835795c8ada7c0f2a1
https://gist.github.com/552ec09e5e78d1cd3da7
https://gist.github.com/7cce0057bf9a3009a92a
https://gist.github.com/0f01364072634cc95be9
https://gist.github.com/9292fc3fe1a2e9afe09d
I'll see if I can translate a few of these test cases into C to figure out whether the problem is reproducible outside of Julia. I'll also try setting the BLIS integer size to 64-bit and delete the USE_BLAS64 = 0
line, see whether that changes anything.
and I check the file "bli_kernel.h" for bgq, line 174 contains
but I cannot locate this header file anywhere in the package.
For my previous ticket "BLIS Test Failure in BlueGene/Q #34", is there any follow up? seems that BLIS still not working correctly for all complex test cases.
Is there any instruction to build LAPACK to work with BLIS?
Thanks!
As part of the configuration, BLIS should allow the developer to specify the function to call for allocating memory for the following three use cases that occur in BLIS:
bli_obj_create()
and friends.The BLIS developer would specify, in bli_kernel.h
, cpp macros to identify the names of the malloc()-style
functions to use in any of the above cases. It would then be the developer's responsibility to ensure that the object code that defines the malloc()
substitutes are available at link-time. The developer does not need to provide a prototype for the malloc()
substitutes since we will require those functions to adhere to the same function signature as malloc()
, and therefore BLIS can/will provide those prototypes on behalf of the developer, similar to what is done when a developer defines custom kernels/micro-kernels.
BLIS often refers to fused level 1 BLAS-like operations, but I have not seen any fused level 2 operations (e.g., a single-sweep y := A x and u := A^T v). Is there a plan to support such kernels?
[jhammond@ftlogin2 git]$ cat 0002-generic-gcc-path-instead-of-something-at-IBM-Austin.patch 0003-generic-gcc-path-instead-of-something-at-IBM-Austin.patch
From f02aca90c2c045c3ed7573ff5bb8a82b3e45938b Mon Sep 17 00:00:00 2001
From: Jeff Hammond <[email protected]>
Date: Mon, 31 Mar 2014 21:53:56 +0000
Subject: [PATCH 2/5] generic gcc path instead of something at IBM Austin
---
kernels/power7/3/test/Makefile | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/kernels/power7/3/test/Makefile b/kernels/power7/3/test/Makefile
index 356cde9d..15f27b81 100644
--- a/kernels/power7/3/test/Makefile
+++ b/kernels/power7/3/test/Makefile
@@ -1,5 +1,5 @@
-CC = /opt/at6.0/bin/powerpc64-linux-gcc
+CC = gcc
TARGET_ARCH = -m64 -mvsx
TGTS = exp
--
1.9.1
From edd5efef2508cf3623d55dd47bb92159ebb2ee34 Mon Sep 17 00:00:00 2001
From: Jeff Hammond <[email protected]>
Date: Mon, 31 Mar 2014 21:54:15 +0000
Subject: [PATCH 3/5] generic gcc path instead of something at IBM Austin
---
config/power7/make_defs.mk | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/config/power7/make_defs.mk b/config/power7/make_defs.mk
index df3eb363..de3f05b5 100644
--- a/config/power7/make_defs.mk
+++ b/config/power7/make_defs.mk
@@ -76,7 +76,7 @@ GIT_LOG := $(GIT) log --decorate
#
# --- Determine the C compiler and related flags ---
-CC := /opt/at6.0/bin/powerpc64-linux-gcc
+CC := gcc
# Enable IEEE Standard 1003.1-2004 (POSIX.1d).
# NOTE: This is needed to enable posix_memalign().
CPPROCFLAGS := -D_POSIX_C_SOURCE=200112L
@@ -96,7 +96,7 @@ CFLAGS_KERNELS := $(CDBGFLAGS) $(CKOPTFLAGS) $(CVECFLAGS) $(CWARNFLAGS) $(CMISCF
CFLAGS_NOOPT := $(CDBGFLAGS) $(CWARNFLAGS) $(CMISCFLAGS) $(CPPROCFLAGS)
# --- Determine the archiver and related flags ---
-AR := /opt/at6.0/bin/powerpc64-linux-ar
+AR := ar
ARFLAGS := cru
# --- Determine the linker and related flags ---
--
1.9.1
A user of Elemental has been running into strange performance issues when building on top of BLIS that seem to be due to the environment variable OMP_NUM_THREADS
being set to one still leading to a large performance degradation when the number of independent uses of BLIS times the number of configured threads is larger than the number of cores on the machine.
While they have configured BLIS to use OpenMP with 16 threads, it is often preferred when using the library from within an MPI environment to be able to disable threading at runtime (or at least, decrease the number of threads).
What environment variables should be set to one to have the same effect as exporting OMP_NUM_THREADS=1
for other BLAS libraries? I would humbly suggest either adding support for OMP_NUM_THREADS
or adding the full list of variables (including BLIS_IR_NT
) to the wiki that need to be configured to have the same effect.
I've just set up my first blis program, but calling bli_init() causes the program to trigger a SIGILL. I'm using Ubuntu 14.04 GNU/Linux using the Sandymount configuration for BLIS. It also breaks here when I run the test_gemm_blis.x test in blis/test/ or the test suite with the given makefiles.
I've traced it down to inside of bli_const_init() using gdb.
bli_obj_create_const(2.0, &BLIS_TWO);
I've tried re-configuring & rebuilding the library several times but it doesn't seem to help.
On my i5-6500, the configuration is auto-detected as 'reference'. Since there is no skylake ukernel, the config should be set to haswell.
The de facto standard is that the standard BLAS/CBLAS functions take 32-bit integers in their API. Julia experimented with changing this so that they could use 64-bit integers in their main BLAS wrappers, and this worked great for a little while until they discovered that when people started trying to link in other existing BLAS-using code, this code was assuming that BLAS uses 32-bit integers and was causing segfaults. Their solution was to continue to use a 64-bit integer version of BLAS, but with symbols renamed to avoid collisions (so e.g. dgemm_
uses 32-bit integers, and dgemm_64_
uses 64-bit integers... [edited to get the 64-bit symbol names correct])
As mentioned in #37 (comment) , it would be great if a single BLIS library could export both 32- and 64-bit versions of these symbols simultaneously. It doesn't look like this would be too hard, since both the BLAS2BLIS interface is already generated using C preprocessor magic, and the CBLAS wrapper is already getting programmatically patched...
OpenMP is not supported with Clang
in the build system has been false [http://blog.llvm.org/2015/05/openmp-support_22.html](for almost a year now). While Mac does not appear to support it in the default toolchain, I've been using Homebrew's clang-omp
just fine.
I think that the BLIS build system should test for OpenMP support "the old fashioned way" (i.e. like configure
does) and use OpenMP if it is available.
Is there a simple means of modifying BLIS to build a shared library instead of a static library? It seems to be missing from the configure
script.
I may have a go exploring that if i find some time, but if someone else beats me to it, that'd be great too!
If I only want to use the cblas interface so I only include cblas.h, I get the error: ‘f77_int’ was not declared.
Hi,
I compiled BLIS on my server(CentOS 7.1, gcc 4.8.5, 2*Xeon E5-2620v2), but it seems that BLIS can just use single thread. I used the command
./configure sandybridge
make -j
make install
to compile and install. I didn't change any file in config/sandybridge. I checked that
#define BLIS_ENABLE_OPENMP
is uncommented in bli_config.h, so it should be able to use OpenMP. Then I used the makefile in wiki/BuildSystem#linking-against-blis section and added -fopenmp flag to compile the test program. However the program is single-threaded. I also tried to use the script below to test the program but it still failed to use OpenMP
export OMP_NUM_THREADS=12
make -f BLIS-Makefile
./testBLIS.x
I tried to use
···
#define BLIS_ENABLE_PTHREAD
···
instead of OpenMP setting, but it still fail.
What should I do to use OpenMP?
The culprit seems to be the load of k
in the micro-kernel which is explicitly a movq
. Changing type of k_iter
and k_next
to [u?]int64_t should fix it.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.