xtensor-stack / xtensor-blas Goto Github PK
View Code? Open in Web Editor NEWBLAS extension to xtensor
License: BSD 3-Clause "New" or "Revised" License
BLAS extension to xtensor
License: BSD 3-Clause "New" or "Revised" License
To enable OpenBLAS, it is necessary to
-DWITH_OPENBLAS
cxxstd
directory from FLENS repositoryWith these tips, the following code costs about 0.1 [s] in my computer whereas it costs 16 [s] without them:
template <size_t D> using xtsize = std::array<size_t, D>;
xt::xtensor<double, 2> a(xtsize<2>{2000, 2000}), b(xtsize<2>{2000, 2000});
xt::linalg::dot(a, b);
I would suggest to notice about WITH_OPENBLAS
option in README.md
with other blas options (such as WITH_MKLBLAS
).
In xtensor, data()
returns a raw pointer to a "first element" from which the position of any element can be computed from the strides and indices.
In the case of a view with negative strides, this may not be the first element of the buffer (it can even be the last if all strides are negative).
LAPACK assumes that the pointer passed to it is always the start of the buffer, and will internally compute the "first" element with respect to the striding scheme:
see e.g. here:
138 IF (incx.LT.0) ix = (-n+1)*incx + 1
139 IF (incy.LT.0) iy = (-n+1)*incy + 1
Hence in xtensor-blas, we compute the start of the buffer to pass it to LAPACK which does the inverse transformation. (This fix was added by @wolfv in #103).
To prevent that redundant computation, should we
data()
to always point to the start of the buffer, which would invalidate the striding schemes on strided views.#define XTENSOR_DEFAULT_LAYOUT layout_type::column_major
causes a simple outer product to fail with this error meassage:
** On entry to DGER parameter number 9 had an illegal value
I want to do matrix multiplication for two tensors with shape (1, 2, 2). Using a (2, 2) shape works, but using an array with an empty dimension (shape (1, 2, 2)) gives Process finished with exit code 139 (interrupted by signal 11: SIGSEGV)
.
I am using xtensor-blas 0.16.1
from conda. I have the following blas library installed:
ii libopenblas-base:amd64 0.3.5+ds-2 amd64 Optimized BLAS (linear algebra) library (shared library)
ii libopenblas-dev:amd64 0.3.5+ds-2 amd64 Optimized BLAS (linear algebra) library (development files)
Code example:
#include <xtensor/xtensor.hpp>
#include <xtensor/xio.hpp>
#include "xtensor-blas/xlinalg.hpp"
template <typename T, size_t N>
std::ostream& operator<<(std::ostream& os, std::array<T, N> a) {
os << "Shape(";
for (auto& i : a) {
os << i << " ";
}
os << ")";
return os;
}
using complex = std::complex<double>;
void works() {
using matrix_t = xt::xtensor<complex, 2>;
matrix_t A{{std::cos(35), 0}, {0, 1}};
matrix_t B{{1, -1}, {-1, 1}};
std::cout << A << std::endl;
std::cout << B << std::endl;
std::cout << A.shape() << std::endl;
std::cout << B.shape() << std::endl;
auto G = xt::linalg::dot(A, B);
std::cout << G << std::endl;
}
void fails() {
using matrix_t = xt::xtensor<complex, 3>;
matrix_t A{{{std::cos(35), 0}, {0, 1}}};
matrix_t B{{{1, -1}, {-1, 1}}};
std::cout << A << std::endl;
std::cout << B << std::endl;
std::cout << A.shape() << std::endl;
std::cout << B.shape() << std::endl;
auto G = xt::linalg::dot(A, B);
std::cout << G << std::endl;
}
int main() {
std::cout << "Working code: " << std::endl;
works();
std::cout << "\nSegmentation fault: " << std::endl;
fails();
}
Output:
Working code:
{{-0.903692+0.i, 0. +0.i},
{ 0. +0.i, 1. +0.i}}
{{ 1.+0.i, -1.+0.i},
{-1.+0.i, 1.+0.i}}
Shape(2 2 )
Shape(2 2 )
{{-0.903692+0.i, 0.903692+0.i},
{-1. +0.i, 1. +0.i}}
Segmentation fault:
{{{-0.903692+0.i, 0. +0.i},
{ 0. +0.i, 1. +0.i}}}
{{{ 1.+0.i, -1.+0.i},
{-1.+0.i, 1.+0.i}}}
Shape(1 2 2 )
Shape(1 2 2 )
EDIT:
I know using xt::squeeze to remove empty dimensions works. I am still curious why the error occurs.
I installed xtl, xtensor and xtensor-blas via conda. Versions:
libblas conda-forge/osx-64::libblas-3.8.0-9_openblas
libcblas conda-forge/osx-64::libcblas-3.8.0-9_openblas
liblapack conda-forge/osx-64::liblapack-3.8.0-9_openblas
liblapacke conda-forge/osx-64::liblapacke-3.8.0-9_openblas
openblas conda-forge/osx-64::openblas-0.3.6-hd44dcd8_2
xtensor conda-forge/osx-64::xtensor-0.20.5-h770b8ee_0
xtensor-blas conda-forge/osx-64::xtensor-blas-0.16.1-h7f1c7fb_0
xtl conda-forge/osx-64::xtl-0.6.4-h770b8ee_0
I use the following settings to build by Xcode project with xtensor:
Project type: command line tool
Search Paths:
/Users/[username]/anaconda3/include
/Users/[username]/anaconda3/lib
Apple Clang - Language - C++:
Link binary with libraries
(both these libraries are in the directory /Users/[username]/anaconda3/lib
)
When building my project, Xcode raises the error No matching function for call to 'gesdd'
in the file /Users/[username]/anaconda3/include/xtensor-blas/xlapack.hpp
. Is there any solution to this problem?
I'm using Xcode 10.2.1 on macOS Mojave 10.14.4.
The following piece of code
#include <iostream>
#include <xtensor/xarray.hpp>
#include <xtensor/xio.hpp>
#include <xtensor/xview.hpp>
#include <xtensor-blas/xlinalg.hpp>
#define SHOW(x) { std::cout << (#x) << " = " << (x) << std::endl; }
int main(int argc, char** argv)
{
using xt::placeholders::_;
xt::xarray<double> A = {{2, 3}, {5, 7}, {11, 13}};
auto A1 = xt::view(A, xt::range(_, 3), 0);
auto A2 = xt::view(A, xt::range(-1, -3-1, -1), 1);
xt::xarray<double> B1 = A1;
xt::xarray<double> B2 = A2;
SHOW(A);
SHOW(A1);
SHOW(A2);
SHOW(xt::linalg::dot(A1, A2));
SHOW(B1);
SHOW(B2);
SHOW(xt::linalg::dot(B1, B2));
return 0;
}
produces the following output
A = {{ 2., 3.},
{ 5., 7.},
{ 11., 13.}}
A1 = { 2., 5., 11.}
A2 = { 13., 7., 3.}
xt::linalg::dot(A1, A2) = { 168.}
B1 = { 2., 5., 11.}
B2 = { 13., 7., 3.}
xt::linalg::dot(B1, B2) = { 94.}
Currently, xtensor-blas tries to return xtensors if the dimensions of the return type are known. I'm not sure if that's a good idea, and in some cases it changes the interface a bit (e.g. in lstsq
the solution is always in 2D which is different from numpy).
Also, people might want to change the dimensionality of the return array.
Should we make the return type selectable, or use xarray?
I tried some test code on xblas::gemm, and got some error message. I just wonder if I used xblas::gemm in a correct way. I got the test code from https://github.com/QuantStack/xtensor-blas/blob/master/test/test_blas.cpp, and made some changes.
The original code is shown as below. It can be compiled with no errors.
xt::xarray<double> X = {{1, 2, 3}, {1, 2, 3}};
auto M = xt::xarray<double>::from_shape({3, 3});
xt::blas::gemm(X, X, M, true, false, 1.0, 0.0);
Then, I made a change on the second line of the original code.
xt::xarray<double> X = {{1, 2, 3}, {1, 2, 3}};
auto M = xt::pyarray<double>::from_shape({3, 3});
xt::blas::gemm(X, X, M, true, false, 1.0, 0.0);
When I compiled the new code, I got the error message: "... xtensor-blas/xblas.hpp:176:40: error: static assertion failed: GEMM result layout cannot be dynamic. ..." Could you please explain the error in detail? Thanks a lot.
import sklearn
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model, datasets
logistic = linear_model.LogisticRegression()
digits = datasets.load_digits()
X_digits = digits.data.astype(int)
y_digits = digits.target.astype(int)
logistic.fit(X_digits, y_digits)
with open("weights.npy", "ab") as f:
np.save(f, logistic.coef_)
with open("coefs.npy", "ab") as f:
np.save(f, logistic.intercept_)
with open("mnist.npy", "ab") as f:
np.save(f, X_digits)
print(X_digits.dot(logistic.coef_.T))
Yields:
[[ 16.04425395 -24.57188278 -18.99064139 ... -12.53595987 -6.59369383
-8.84759654]
[-39.86667809 9.89095239 -21.03906964 ... -29.23834092 -4.68151324
-19.8681834 ]
[-24.13828765 -2.48055447 8.55444032 ... -34.4661001 0.46635158
-26.19119682]
...
[-26.91305732 -5.43773514 -19.43945597 ... -34.21642645 8.02700404
-21.49129071]
[ -7.463867 -9.234329 -25.17197869 ... -26.92164295 -1.00534219
12.71348187]
[-25.10017597 -11.51306815 -20.65175452 ... -40.86844194 5.43239693
-10.62263562]]
After deserializing the matrix and doing dot products on it in xtensor:
#include <iostream>
#include <xtensor/xarray.hpp>
#include <xtensor/xio.hpp>
#include <xtensor/xview.hpp>
#include <xtensor/xexpression.hpp>
#include <xtensor/xnpy.hpp>
#include <xtensor-blas/xlinalg.hpp>
xt::xarray<double> x = xt::load_npy<double>("mnist.npy");
xt::xarray<double> w = xt::load_npy<double>("weights.npy");
xt::xarray<double> b = xt::load_npy<double>("coefs.npy");
std::cout << xt::linalg::dot(x, xt::transpose(w));
yields:
{{ 16.044265, -24.571906, -18.990706, ..., -12.53596 , -6.593694, -8.847597},
{-39.866668, 9.890924, -21.039149, ..., -29.238341, -4.681513, -19.868183},
{-24.138276, -2.480522, 8.554367, ..., -34.4661 , 0.466352, -26.191197},
...,
{-26.913044, nan, -19.439551, ..., -34.216426, 8.027004, -21.491291},
{ -7.463854, -9.234249, -25.172056, ..., -26.921643, nan, 12.713482},
{ nan, nan, -20.651845, ..., -40.868442, 5.432397, nan}}
Which has the same floating point values in most, but contains nan
in some fields which are likely due to floating point overflow or underflow (can't be divide by 0).
Environment info:
(cling) ❯ brew list | grep blas
openblas
(cling) ❯ conda list | grep xtensor
xtensor 0.16.4 0 QuantStack
xtensor-blas 0.11.1 blas_openblash2d50403_0 [blas_openblas] conda-forge
Using OSX v10.13.6 High Sierra
.
I have written this simple code:
#include <iostream>
#include <tuple>
#include <complex>
#include "xtensor/xarray.hpp"
#include "xtensor/xio.hpp"
#include "xtensor/xmath.hpp"
#include "xtensor/xview.hpp"
#include "xtensor/xcomplex.hpp"
#include "xtensor/xoperation.hpp"
#include "xtensor-blas/xlinalg.hpp"
using namespace std;
using namespace xt;
xarray<double> nxn_array(unsigned int);
int main()
{
auto test = nxn_array(3); //array w/ values that go from 1 to 9
auto test1 = ones_like(test);
xarray<complex<double>> eig = linalg::eigvals(test1);
return 0;
}
And it is outputting the linker error:
Error LNK2019 unresolved external symbol _dgeev_ referenced in function "int __cdecl cxxlapack::geev<int>(char,char,int,double *,int,double *,double *,double *,int,double *,int,double *,int)" (??$geev@H@cxxlapack@@YAHDDHPANH000H0H0H@Z) FileName C:\...\FileName\FileName\FileName.obj 1
I don't really know anything about linker errors so I'm not really sure how to resolve this issue.
It seems that OpenBLAS is not available on conda-forge, resulting in travis failure.
Hi @JohanMabille / @SylvainCorlay,
I'd like your input on this. Back in the day, I used xscalar<value_type> for alpha and beta as arguments. The reason was, that I was experimenting with some fancy template overload stuff which would automatically detect an xfunction with an xscalar and evaluate to the correct xblas overload if possible (it was experimental).
Now, it's kinda inconvenient to use the API directly (as in #26) as one needs to explicitly use xscalar(123.0) as argument.
Can we add an automatic conversion constructor (such that one can just pass the number and it get's automatically converted into an xscalar)? Or should we change the interface of the xblas functions to take value_type&
instead?
Cheers,
Wolf
During the install process FLENS is installed to the incorrect directory.
Installing in a fresh conda env then using CMake's find_package
causes a set and check failure.
Here is a simple extension of the dot function. The inputs are assumed to be 3-d numpy ndarrays, and it just returns the dot product of the first slices of the inputs.
inline xt::pyarray<double> dot3d(xt::pyarray<double> &ar0,
xt::pyarray<double> &ar1)
{
xt::xarray<double> ar0_x(ar0);
xt::xarray<double> ar1_x(ar1);
auto out = xt::linalg::dot(xt::view(ar0_x, 0),
xt::view(ar1_x, 0));
return xt::pyarray<double> (out);
}
This appears to compile without issues. Here's the output from the interpreter --
`
temp1 = np.random.randn(50).reshape(2,5,5)
temp2 = np.random.randn(20).reshape(2,5,2)
temp3 = am_ext.dot3d(temp1, temp2)
Traceback (most recent call last):
File "", line 1, in
RuntimeError: No valid layout chosen.
`
Is this an expected behavior?
The context here is, I'm trying to write a function that takes two numpy 3-d arrays and single 1-d index, and apply dot function on the corresponding slices of the inputs based on the index.
The following minimal test fails
TEST(xlinalg, lstsq_m_lessthan_n_minimal)
{
xarray<double> arg_0 = {{ 0., 1.}};
xarray<double> arg_1 = {1.};
auto res = xt::linalg::lstsq(arg_0, arg_1);
}
The error is "Could not find workspace size for gelsd."
According to http://www.netlib.org/lapack/explore-html/d7/d3b/group__double_g_esolve_ga94bd4a63a6dacf523e25ff617719f752.html#ga94bd4a63a6dacf523e25ff617719f752:
ldb
should be >= max(1,max(M,N))
.
From
https://github.com/QuantStack/xtensor-blas/blob/9725a505db5c24f4037f9df4691171037e9f7026/include/xtensor-blas/xlapack.hpp#L693-L702
this looks like b
and ldb
is not handled differently when M < N
.
xt::linalg::inv
fails for 1x1 matrices, since xt::xarray::strides
returns {0,0}
. The following produces a runtime error:
#include <xtensor/xarray.hpp>
#include <xtensor-blas/xlinalg.hpp>
int main()
{
xt::xarray<double> A = xt::zeros<double>({1, 1});
auto B = xt::linalg::inv(A);
}
** On entry to DGETRF, parameter number 4 had an illegal value
info = -4
Assertion failed: (info>=0), function getrf, file XXX/xflens/cxxlapack/interface/getrf.tcc, line 87.
and other things.
We should prevent this either:
This is shown in the following code example.
#include "xtensor/xadapt.hpp"
#include "xtensor-blas/xlinalg.hpp"
int main()
{
std::vector<double> vec{0, 1, 2, 3};
std::vector<size_t> shape{2, 2};
auto arr = xt::adapt(
vec,
shape
);
auto arr2 = xt::adapt(
vec.data(),
vec.size(),
xt::no_ownership(),
shape,
std::vector<int>{2, 1}
);
std::cout << arr << std::endl;
std::cout << xt::linalg::dot(arr, arr) << std::endl;
std::cout << arr2 << std::endl;
std::cout << xt::linalg::dot(arr2, arr2) << std::endl;
return 0;
}
Output:
{{ 0., 1.},
{ 2., 3.}}
{{ 2., 3.},
{ 6., 11.}}
{{ 0., 1.},
{ 2., 3.}}
terminate called after throwing an instance of 'std::runtime_error'
what(): No valid layout chosen.
This is caused by the dynamic layout_type
and get_leading_stride()
in https://github.com/QuantStack/xtensor-blas/blob/master/include/xtensor-blas/xblas_utils.hpp#L129.
Hi, I am trying to use xtensor-blas on Windows. I have followed the steps to build it from source using Cmake properly and had no issues with that.
I am using MSVC to compile one of the examples that were given.
When I run: "cl xtens.cpp /I C:\Program1\include"
Output:
Microsoft (R) C/C++ Optimizing Compiler Version 19.16.27027.1 for x86
Copyright (C) Microsoft Corporation. All rights reserved.
xtens.cpp
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\xlocale(319): warning C4530: C++ exception handler used, but unwind semantics are not enabled. Specify /EHsc
Microsoft (R) Incremental Linker Version 14.16.27027.1
Copyright (C) Microsoft Corporation. All rights reserved.
/out:xtens.exe
xtens.obj
xtens.obj : error LNK2019: unresolved external symbol _cblas_ddot referenced in function "void __cdecl cxxblas::dot(int,double const *,int,double const *,int,double &)" (??$dot@H@cxxblas@@YAXHPBNH0HAAN@Z)
xtens.obj : error LNK2019: unresolved external symbol _cblas_dgemv referenced in function "void __cdecl cxxblas::gemv(enum cxxblas::StorageOrder,enum cxxblas::Transpose,int,int,double,double const *,int,double const *,int,double,double *,int)" (??$gemv@H@cxxblas@@YAXW4StorageOrder@0@W4Transpose@0@HHNPBNH2HNPANH@Z)
xtens.obj : error LNK2019: unresolved external symbol _cblas_dgemm referenced in function "void __cdecl cxxblas::gemm(enum cxxblas::StorageOrder,enum cxxblas::Transpose,enum cxxblas::Transpose,int,int,int,double,double const *,int,double const *,int,double,double *,int)" (??$gemm@H@cxxblas@@YAXW4StorageOrder@0@W4Transpose@0@1HHHNPBNH2HNPANH@Z)
xtens.obj : error LNK2019: unresolved external symbol _cblas_dsyrk referenced in function "void __cdecl cxxblas::syrk(enum cxxblas::StorageOrder,enum cxxblas::StorageUpLo,enum cxxblas::Transpose,int,int,double,double const *,int,double,double *,int)" (??$syrk@H@cxxblas@@YAXW4StorageOrder@0@W4StorageUpLo@0@W4Transpose@0@HHNPBNHNPANH@Z)
xtens.exe : fatal error LNK1120: 4 unresolved externals`
Program1\include\ folder contains folders: xtensor,xtensor-blas,xflens,xtl which contain the header fo respective libraries.
How can I solve this problem?
I am hardcoding the xtensor-blas path in my CMakeLists.txt right now, which is obviously not ideal.
project(KALMANPP)
cmake_minimum_required(VERSION 3.1)
# Set flags and params
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_FLAGS_DEBUG "-Wall -g -W")
=====> include_directories(/usr/local/include/xtensor-blas/) <=====
find_package(XTENSOR REQUIRED)
find_package(BLAS)
find_package(LAPACK)
add_executable(kalmanpp src/Main.cpp)
target_link_libraries(kalmanpp lapack blas)
I installed xtensor-blas via cmake, identical to the xtensor. All files seem to be installed correctly, /usr/local/include/xtensor-blas/
on my Mac. I'm not an expert with CMake, but looking at your CMakeLists.txt, I can't find any "find_package(XTENSOR_BLAS)" or similar. How can I include it in cmake?
Thanks!
Hi Guys,
As said in the title, xtensor-blasConfig.cmake
is installed by default in share/cmake/xtensor-blas
folder while for the rest of the family such files are installed in lib/cmake/family-name
.
Of course, this not urgent :-) !
Cheers,
Thibaud.
Hey guys,
I noticed that there's a bug in the gemm function. If you try to calculate M = X'*X in the following way:
xt::xarray<double> X = {{1, 2, 3},
{1, 2, 3}};
std::vector<int> shape = {3, 3};
xt::xarray<double> M;
M.reshape(shape);
xt::blas::gemm(X, X, M, xt::xscalar<double>(1), xt::xscalar<double>(0), true, false);
Cheers
As per xtensor-stack/xtensor#417
Cross
At the moment, cross seems to be limited to 1D vectors of size 2 or 3. Numpy doesn't have the constraint for dimensionality and is capable of computing cross for higher dimensionality arrays if the last dimension has length of 2 or 3.
xt::xarray<double> firstArray = xt::ones<double>({ 3, 4, 2 });
xt::xarray<double> secondArray = xt::ones<double>({ 3, 4, 2 });
auto result = xt::linalg::cross(firstArray, secondArray);
std::runtime_error("a or b did not have appropriate size 2 or 3.");
NumPy:
a = numpy.ones([3, 4, 2])
b = numpy.ones([3, 4, 2])
numpy.cross(a, b)
array([[0., 0., 0., 0.],
[0., 0., 0., 0.],
[0., 0., 0., 0.]])
numpy.cross(a, b).shape
(3, 4)
Pinv
Pinv seems to support only 1D or 2D arrays. NumPy does support higher dimensionality.
xt::xarray<double> a = xt::ones<double>({ 1, 2, 3, 4});
auto result = xt::linalg::pinv(a);
XTENSOR_ASSERT_MSG(N == e.derived_cast().dimension(), "Cannot change dimension of xtensor.");
NumPy:
numpy.linalg.pinv(numpy.ones([1, 2, 3, 4]))
array([[[[0.08333333, 0.08333333, 0.08333333],
[0.08333333, 0.08333333, 0.08333333],
[0.08333333, 0.08333333, 0.08333333],
[0.08333333, 0.08333333, 0.08333333]],
[[0.08333333, 0.08333333, 0.08333333],
[0.08333333, 0.08333333, 0.08333333],
[0.08333333, 0.08333333, 0.08333333],
[0.08333333, 0.08333333, 0.08333333]]]])
a = numpy.array([[[1, 2, 3]]])
a.shape
(1, 1, 3)
numpy.linalg.pinv(a)
array([[[0.07142857],
[0.14285714],
[0.21428571]]])
Solve
Solve seems to be limited to <= 2D by XTENSOR_ASSERT.
auto a = xt::xarray<double>({ {0.72999173, 0.94937934, 0.6160863},
{0.67719638, 0.11662088, 0.86027322},
{0.22839025, 0.14736096, 0.36511296} });
auto b = xt::xarray<double>({ {{0.42682534, 0.18769713},
{0.09106305, 0.93373785},
{0.3472225 , 0.05267208}},
{{0.09015187, 0.94741843},
{0.85322852, 0.25576913},
{0.47621316, 0.5724512}} });
auto result = xt::linalg::solve(a, b);
XTENSOR_ASSERT failed.
NumPy:
a = numpy.random.rand(9).reshape(3,3)
b = numpy.random.rand(12).reshape(2, 3, 2)
numpy.linalg.solve(a, b)
array([[[-3.08713549, 3.33567358],
[ 1.29121029, -1.49942674],
[ 2.36096848, -1.33713771]],
[[-1.99462003, -4.35353126],
[-0.03715958, 2.11458264],
[ 2.56698863, 3.43769886]]])
Having ability to compute this functions with higher dimensionality arrays would help anyone doing more demanding calculations.
In order to not run into the issue with xt::pyarray
we should use the size computed from shape here:
Hi, I am trying to use both xtensor-blas
and TBB acceleration at the same time. However, the compilation fails.
I am using the following CMakeLists.txt
cmake_minimum_required(VERSION 3.1)
project(test_blas)
# Try to use only anaconda provided stuff
set(CMAKE_PREFIX_PATH /Users/scheibler/anaconda3/envs/mixiva ${CMAKE_PREFIX_PATH})
find_package(xtl REQUIRED)
find_package(xtensor REQUIRED)
add_executable(test_blas src/test_blas.cpp)
OPTION(XTENSOR_USE_XSIMD "simd acceleration for xtensor" ON)
OPTION(XTENSOR_USE_TBB "enable parallelization using intel TBB" ON)
if(XTENSOR_USE_XSIMD)
set(xsimd_REQUIRED_VERSION 7.0.0)
find_package(xsimd ${xsimd_REQUIRED_VERSION} REQUIRED)
message(STATUS "Found xsimd: ${xsimd_INCLUDE_DIRS}/xsimd")
endif()
if(XTENSOR_USE_TBB)
set(CMAKE_MODULE_PATH "${CMAKE_MODULE_PATH}" "${CMAKE_CURRENT_SOURCE_DIR}/cmake/")
find_package(TBB REQUIRED)
message(STATUS "Found intel TBB: ${TBB_INCLUDE_DIRS}")
endif()
if(MSVC)
target_compile_options(test_blas PRIVATE /EHsc /MP /bigobj)
set(CMAKE_EXE_LINKER_FLAGS /MANIFEST:NO)
endif()
if (CMAKE_CXX_COMPILER_ID MATCHES "Clang" OR
CMAKE_CXX_COMPILER_ID MATCHES "GNU" OR
(CMAKE_CXX_COMPILER_ID MATCHES "Intel" AND NOT WIN32))
target_compile_options(test_blas PRIVATE -march=native -std=c++14)
endif()
add_definitions(-DHAVE_CBLAS=1)
add_definitions(-DXTENSOR_USE_TBB)
add_definitions(-DXTENSOR_USE_SIMD)
# set(BLA_VENDOR Intel10_64lp)
if (WIN32)
find_package(OpenBLAS REQUIRED)
set(BLAS_LIBRARIES ${CMAKE_INSTALL_PREFIX}${OpenBLAS_LIBRARIES})
else()
find_package(BLAS REQUIRED)
find_package(LAPACK REQUIRED)
endif()
message(STATUS "BLAS VENDOR: " ${BLA_VENDOR})
message(STATUS "BLAS LIBRARIES: " ${BLAS_LIBRARIES})
target_link_libraries(test_blas xtensor ${BLAS_LIBRARIES} tbb)
And the code is
#include <iostream>
#include <xtensor/xtensor.hpp>
#include <xtensor/xrandom.hpp>
#include <xtensor-blas/xlinalg.hpp>
int main()
{
xt::xtensor<double, 2, xt::layout_type::row_major> x = xt::random::randn<double>({1000, 1000});
xt::xtensor<double, 2, xt::layout_type::row_major> y = xt::random::randn<double>({1000, 1000});
auto y_T = xt::transpose(y);
xt::xtensor<double, 2> z = xt::linalg::dot(x, y_T);
for (size_t i ; i < 10 ; i++)
z = xt::linalg::dot(x, y_T);
std::cout << z(0, 0) << std::endl;
}
The output of compilation is this.
make test_blas
Scanning dependencies of target test_blas
[ 50%] Building CXX object CMakeFiles/test_blas.dir/src/test_blas.cpp.o
In file included from /Users/scheibler/Documents/Research/Projects/Current/Separation/mixiva/test_xtensor/src/test_blas.cpp:5:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xtensor-blas/xlinalg.hpp:29:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xtensor-blas/xlapack.hpp:23:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/cxxlapack.cxx:37:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/cxxlapack.tcc:36:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/interface/interface.tcc:36:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/interface/bbcsd.tcc:39:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/netlib.h:25:
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:973:39: error:
expected ')'
const FLOAT *LSCALE,
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
expanded from macro 'LSCALE'
#define LSCALE 1000 /* scaling for "fixed point" arithmetic */
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:968:20: note:
to match this '('
LAPACK_IMPL(cggbak)(const char *JOB,
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:990:39: error:
expected ')'
FLOAT *LSCALE,
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
expanded from macro 'LSCALE'
#define LSCALE 1000 /* scaling for "fixed point" arithmetic */
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:982:20: note:
to match this '('
LAPACK_IMPL(cggbal)(const char *JOB,
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:1087:39: error:
expected ')'
FLOAT *LSCALE,
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
expanded from macro 'LSCALE'
#define LSCALE 1000 /* scaling for "fixed point" arithmetic */
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:1070:20: note:
to match this '('
LAPACK_IMPL(cggevx)(const char *BALANC,
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6787:39: error:
expected ')'
const DOUBLE *LSCALE,
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
expanded from macro 'LSCALE'
#define LSCALE 1000 /* scaling for "fixed point" arithmetic */
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6782:20: note:
to match this '('
LAPACK_IMPL(dggbak)(const char *JOB,
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6804:39: error:
expected ')'
DOUBLE *LSCALE,
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
expanded from macro 'LSCALE'
#define LSCALE 1000 /* scaling for "fixed point" arithmetic */
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6796:20: note:
to match this '('
LAPACK_IMPL(dggbal)(const char *JOB,
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6902:39: error:
expected ')'
DOUBLE *LSCALE,
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
expanded from macro 'LSCALE'
#define LSCALE 1000 /* scaling for "fixed point" arithmetic */
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6884:20: note:
to match this '('
LAPACK_IMPL(dggevx)(const char *BALANC,
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13109:39: error:
expected ')'
const FLOAT *LSCALE,
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
expanded from macro 'LSCALE'
#define LSCALE 1000 /* scaling for "fixed point" arithmetic */
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13104:20: note:
to match this '('
LAPACK_IMPL(sggbak)(const char *JOB,
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13126:39: error:
expected ')'
FLOAT *LSCALE,
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
expanded from macro 'LSCALE'
#define LSCALE 1000 /* scaling for "fixed point" arithmetic */
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13118:20: note:
to match this '('
LAPACK_IMPL(sggbal)(const char *JOB,
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13224:39: error:
expected ')'
FLOAT *LSCALE,
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
expanded from macro 'LSCALE'
#define LSCALE 1000 /* scaling for "fixed point" arithmetic */
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13206:20: note:
to match this '('
LAPACK_IMPL(sggevx)(const char *BALANC,
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19244:39: error:
expected ')'
const DOUBLE *LSCALE,
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
expanded from macro 'LSCALE'
#define LSCALE 1000 /* scaling for "fixed point" arithmetic */
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19239:20: note:
to match this '('
LAPACK_IMPL(zggbak)(const char *JOB,
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19261:39: error:
expected ')'
DOUBLE *LSCALE,
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
expanded from macro 'LSCALE'
#define LSCALE 1000 /* scaling for "fixed point" arithmetic */
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19253:20: note:
to match this '('
LAPACK_IMPL(zggbal)(const char *JOB,
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19358:39: error:
expected ')'
DOUBLE *LSCALE,
^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
expanded from macro 'LSCALE'
#define LSCALE 1000 /* scaling for "fixed point" arithmetic */
^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19341:20: note:
to match this '('
LAPACK_IMPL(zggevx)(const char *BALANC,
^
12 errors generated.
make[3]: *** [CMakeFiles/test_blas.dir/src/test_blas.cpp.o] Error 1
make[2]: *** [CMakeFiles/test_blas.dir/all] Error 2
make[1]: *** [CMakeFiles/test_blas.dir/rule] Error 2
make: *** [test_blas] Error 2
The code is compiled on Mac OS X Mojave (10.14.4) with Apple LLVM
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
When I add #undef XTENSOR_USE_TBB
at the beginning of the code, it compiles and runs as expected.
How should user consume xtensor-blas? I discovered that there is an xtensor_blas_INCLUDE_DIRS
that I can use to populate targets in my library via target_include_dirs
but I would prefer to simply use target_link_libraries(mylib <LINKAGE> xtensor_blas)
.
xtensor has an optimization where a stride can be 0.
We should add some tests to cover this special stride length, and make sure that the BLAS implementations we're testing against are fine with it.
I'm trying to reach the same performance as NumPy. With this code below I evaluate the speed of dot product using xtensor-blas:
benchmark.cpp
#include <iostream>
#include "xtensor/xarray.hpp"
#include "xtensor/xrandom.hpp"
#include "xtensor-blas/xlinalg.hpp"
xt::xtensor<double, 2> my_dot(xt::xtensor<double, 2> &A, xt::xtensor<double, 2> &B){
auto start = std::chrono::steady_clock::now();
xt::xtensor<double, 2> C = xt::linalg::dot(A, B);
auto end = std::chrono::steady_clock::now();
std::cout << std::chrono::duration<double, std::milli>(end - start) .count()
<< " ms" << std::endl;
return C;
}
int main(int argc, char *argv[])
{
for (int i=0;i<3;i++){
xt::xtensor<double, 2> A = xt::random::randn({1000,2000}, 0., 1.);
xt::xtensor<double, 2> B = xt::random::randn({2000,2000}, 0., 1.);
xt::xtensor<double, 2> C = my_dot(A, B);
}
return 0;
}
I compile this with this command:
$ g++ benchmark.cpp -O3 -mavx2 -ffast-math \
-I/home/--user--/install_path/include \
-lblas -llapack \
-DHAVE_CBLAS=1 \
-o a.out
and I get
$ ./a.out
2395.9 ms
2415.98 ms
2461.93 ms
(Install_path/include contains all necessary libraries I.e xtensor, xtensor-blas...)
While when I multiply arrays of the same size in NumPy I get
$ python benchmark.py
76.17 ms
76.05 ms
79.00 ms
So this is mush SLOWER!
I've installed sudo apt-get install libblas-dev liblapack-dev
System:
Ubuntu: 18.04
gcc: 7.4.0
latest versions of xtensor libraries
Could you please help me with this?
When mulitplying two arrays, the result is always vector of 0.
(shape agrees, tho).
auto x = xt::random::randn<double>({784});
auto weights = xt::xarray<double> ({1}); // should perform identity
auto res = xt::linalg::outer(x, xt::transpose(weights));
auto res = xt::linalg::tensordot(x, xt::transpose(weights), 0); // same result, also same without transpose
The output was observed in Jupyter Notebook... however, when compiled and executed in my program, I get even stranger error:
Parameter 8 to routine cblas_dger was incorrect
Quick help appreciated!
EDIT: This seems to be the case only for xt::random::randn
. When using xt::arange
, the output is as expected.
xflens defines a number of macros that have names that can easily collide such as INTEGER
or FLOAT
.
This is an issue when using e.g. both symengine
an xtensor-blas
in the same notebook.
I tried to do the following:
xtensor<double, 1> A{1, 2, 3, 4, 5},
B{0.25, 0.5, 0.25},
C(B.shape(), 0);
// create a view to A looking like
// 1, 2, 3,
// 2, 3, 4,
// 3, 4, 5
static_shape<std::size_t, 2> shape{3, 3}, strides{1, 1};
auto A_as_matrix = adapt(&A(0), 9, no_ownership(), shape, strides);
// call matrix multiplication
C = linalg::dot(A_as_matrix, B);
but got the error message "No valid layout chosen." from get_leading_stride()
(xblas_utils.hpp, line 118). However, when I hard-wired get_leading_stride()
to return 1, I got the desired result, so my general idea seems to be valid. How can this be solved?
Current git version tests fail to compile when building test_tensordot.cpp.o.
The message (repeated several times) is:
/usr/include/xtensor/xiterator.hpp:324:86: error: invalid use of incomplete type ‘struct xt::detail::LAYOUT_FORBIDEN_FOR_XITERATOR<(xt::layout_type)0>’
using checking_type = typename detail::LAYOUT_FORBIDEN_FOR_XITERATOR::type;
^
/usr/include/xtensor/xiterator.hpp:271:16: note: declaration of ‘struct xt::detail::LAYOUT_FORBIDEN_FOR_XITERATOR<(xt::layout_type)0>’
struct LAYOUT_FORBIDEN_FOR_XITERATOR;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~
This is with gcc version 8.3.1 20190223 (Red Hat 8.3.1-2) (GCC)
Simple reproducible:
xt::xarray<float> __im2col = {{ 2., 3., 3., 4.},
{ 2., 1., 3., 2.}};
xt::xarray<float> __f = {{ 2., 3., 3.},
{ 1., 4., 5.},
{ 1., 3., 5.},
{ 2., 2., 2.}};
auto _result = xt::linalg::dot(__im2col, __f);
Gives a stack trace in lldb:
EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
frame #0: 0x000000010067d894 unit_tests`mkl_blas_avx2_dgemm_kernel_nocopy_TN_b0 + 48724
unit_tests`mkl_blas_avx2_sgemm_kernel_nocopy_TN_b0:
-> 0x10067d894 <+48724>: vgatherqpd %ymm6, -0x80(%rcx,%ymm7), %ymm0
0x10067d89b <+48731>: vmovaps %ymm3, %ymm6
0x10067d89f <+48735>: vgatherqpd %ymm6, -0x78(%rcx,%ymm7), %ymm1
0x10067d8a6 <+48742>: vmovaps %ymm3, %ymm6
When I link with cblas
instead, it does not fault.
I am using apple clang:
(idp3) ❯ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
It is tested with gcc-8 that this error does NOT occur.
The xt::linalg::trace
cannot deal with complex arrays, which is reflected in this source code (line 1315):
template <class T>
auto trace(const xexpression<T>& M, int offset = 0, int axis1 = 0, int axis2 = 1)
{
const auto& dM = M.derived_cast();
auto d = xt::diagonal(dM, offset, std::size_t(axis1), std::size_t(axis2));
std::size_t dim = d.dimension();
if (dim == 1)
{
return xt::xarray<double>(xt::sum(d)());
}
else
{
return xt::xarray<double>(xt::sum(d, {dim - 1}));
}
}
I tried to modify this part directly by changing xarray<double>
to xarray<std::complex<double>>
. But after I do this, I get another error from xutils.hpp
of xtensor
(line 628):
template <class T>
struct conditional_cast_functor<true, T>
{
template <class U>
inline auto operator()(U&& u) const
{
return static_cast<T>(std::forward<U>(u));
}
};
Error:
Cannot convert 'const std::__1::complex<double>' to 'double' without a conversion operator
Although there is a workaround (tracing the real and the complex parts separately), this is not convenient. Could you please look into this problem and fix it? Thanks!
Hello!
I found a buffer overflow in xtensor-blas while dotting 2 array trough address sanitizer.
Source:
#include <iostream>
#include <xtensor/xarray.hpp>
#include <xtensor/xio.hpp>
#include <xtensor-blas/xlinalg.hpp>
using namespace std;
int main()
{
auto a = xt::zeros<float>({4,2});
auto b = xt::zeros<float>({2,5});
cout << xt::linalg::dot(a, b) << endl;
return 0;
}
Compiled with parameters -O3 -fsanitize=address
on GCC and clang and running the binary results in the following.
=================================================================
==9741==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x604000000078 at pc 0x564c811eda8b bp 0x7ffccb3e7300 sp 0x7ffccb3e72f8
READ of size 4 at 0x604000000078 thread T0
#0 0x564c811eda8a in void cxxblas::gemv_generic<int, float, float, float, float, float>(cxxblas::StorageOrder, cxxblas::Transpose, cxxblas::Transpose, int, int, float const&, float const*, int, float const*, int, float const&, float*, int) (/home/marty/文件/xtensor-test/main+0x11ea8a)
#1 0x564c811f1fe1 in void cxxblas::gemm_generic<int, float, float, float, float, float>(cxxblas::StorageOrder, cxxblas::Transpose, cxxblas::Transpose, int, int, int, float const&, float const*, int, float const*, int, float const&, float*, int) (/home/marty/文件/xtensor-test/main+0x122fe1)
#2 0x564c811e7814 in auto xt::linalg::dot<xt::xbroadcast<xt::xscalar<float const>, std::array<unsigned long, 2ul> >, xt::xbroadcast<xt::xscalar<float const>, std::array<unsigned long, 2ul> > >(xt::xexpression<xt::xbroadcast<xt::xscalar<float const>, std::array<unsigned long, 2ul> > > const&, xt::xbroadcast<xt::xscalar<float const>, std::array<unsigned long, 2ul> ><xt::xbroadcast<xt::xscalar<float const>, std::array<unsigned long, 2ul> > > const&) (/home/marty/文件/xtensor-test/main+0x118814)
#3 0x564c811e6201 in main (/home/marty/文件/xtensor-test/main+0x117201)
#4 0x7f95181a2f49 in __libc_start_main (/usr/lib/libc.so.6+0x20f49)
#5 0x564c810ea989 in _start (/home/marty/文件/xtensor-test/main+0x1b989)
0x604000000078 is located 0 bytes to the right of 40-byte region [0x604000000050,0x604000000078)
allocated by thread T0 here:
#0 0x564c811e1c71 in operator new(unsigned long) (/home/marty/文件/xtensor-test/main+0x112c71)
#1 0x564c811ea010 in void xt::xstrided_container<xt::xtensor_container<xt::uvector<float, std::allocator<float> >, 2ul, (xt::layout_type)255, xt::xtensor_expression_tag> >::reshape<std::array<unsigned long, 2ul> >(std::array<unsigned long, 2ul>&&, bool) (/home/marty/文件/xtensor-test/main+0x11b010)
SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/marty/文件/xtensor-test/main+0x11ea8a) in void cxxblas::gemv_generic<int, float, float, float, float, float>(cxxblas::StorageOrder, cxxblas::Transpose, cxxblas::Transpose, int, int, float const&, float const*, int, float const*, int, float const&, float*, int)
Shadow bytes around the buggy address:
0x0c087fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c087fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c087fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c087fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
0x0c087fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c087fff8000: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00[fa]
0x0c087fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
0x0c087fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
Addressable: 00
Partially addressable: 01 02 03 04 05 06 07
Heap left redzone: fa
Freed heap region: fd
Stack left redzone: f1
Stack mid redzone: f2
Stack right redzone: f3
Stack after return: f5
Stack use after scope: f8
Global redzone: f9
Global init order: f6
Poisoned by user: f7
Container overflow: fc
Array cookie: ac
Intra object redzone: bb
ASan internal: fe
Left alloca redzone: ca
Right alloca redzone: cb
==9741==ABORTING
No idea if this is a xtensor-blas bug or a BLAS library bug thought.
Many thanks.
This is the simplest example that I can construct that still produce this error.
xt::pytensor<double, 2> dot_test(xt::pytensor<double, 3> & ar0,
xt::pytensor<double, 2> & ar1,
xt::pytensor<long, 1> & idx) {
auto shape0 = ar0.shape();
auto shape1 = ar1.shape();
xt::pytensor<double, 2> out({shape0[0], shape0[1]});
out.fill(NAN);
for (unsigned j=0; j<len(idx); ++j)
xt::view(out, j) =
xt::linalg::dot(xt::view(ar0, j),
xt::view(ar1, idx(j)));
return out;
}
If we give a non-contiguous numpy array as the first argument, it fails.
m1 = np.random.randn(30).reshape(2,3,5)
m2 = np.random.randn(4).reshape(2,2)
xt_tools.dot_test(m1[:,:,2:4], m2, np.arange(2))
Traceback (most recent call last):
File "", line 1, in
RuntimeError: No valid layout chosen.
**As the issue I raised a year ago, replacing the dot product line with the following somehow solves the problem. But this is probably not a good permanent fix.
xt::linalg::dot(xt::eval(xt::view(ar0, j)), xt::eval(xt::view(ar1, idx(j))));
I used the cookiecutter to start my own development environment (https://github.com/QuantStack/xtensor-python-cookiecutter). After verifying that it works, I simply added a new function that involves xt::linalg::inv to test out the linear algebra library.
inline xt::pyarray<double> example3(xt::pyarray<double> &m) { return xt::linalg::inv(m); }
And also added the include lines
#include "xtensor-blas/xblas.hpp" #include "xtensor-blas/xlinalg.hpp"
It did compile with the existing setup.py, but upon import I get this error from the interpreter.
ImportError: dlopen(xtensor-example-extension/xtensor_example_extension.cpython-36m-darwin.so, 2): Symbol not found: dgetrf
Referenced from: xtensor-example-extension/xtensor_example_extension.cpython-36m-darwin.so
Expected in: flat namespace
Does anyone know why this is happening? Do I need to add something to setup.py?
Next code crashes on destruction of a3
:
{
xt::xarray<double> a1{1, 2};
{
xt::xarray<double> a2{
{1, 1, 1},
{1, 2, 3}
};
{
xt::xarray<double> a3 = xt::linalg::dot(a1, a2);
std::cout << a3 << std::endl;
} // crashes here
}
}
Also, I expected (3 5 7) as result but the console says {3, 5} (wrong result, wrong shape).
Is this a bug? Or, maybe I'm using xtensor-blas in wrong way, here?
I'm using the xTensor master, with xTL 0.5.4 and OpenBlas (v0.2.14.1) on the Windows 10 x64.
When I tried to use xt::linalg::pinv as in QuanStack/xtensor-blas/test/test_linalg.cpp, line 225 in the debug mode:
xt::xtensor<float, 2> d1 {{1.f, 2.f}}; auto r1 = xt::linalg::pinv(d1);
With Visual Studio 2017, the test crashed right away with the following error:
Message: Invalid parameter detected in function std::array<__int64, 1>::operator [], c:\program files (x86)\microsoft visual studio\2017\enterprise\vc\tools\msvc\14.16.27023\include\array line 184, Expression: “array subscript out of
range”
With Visual Studio 2015, the test passed. However it would crash with the same error as in Visual Studio 2017 when the debugger is attached.
Is it a known issue? Any suggestion?
Support for float precision is missing
xt::linalg::norm(xt::arange<float>(15), 1); // fails to build
xt::linalg::norm(xt::arange<double>(15), 1); // build ok
Added test to showcase this
#127
Due to "Command killed due to excessive memory consumption/
".
We might be able to get around this by ignoring FLENS headers.
xt::linalg::inv
uses LAPACK's getri
, which, in the first call, returns the size of the workspace needed. For matrices with std::complex<double>
scalar type, the following line fails:
At the moment, at least dot and solve seems to handle empty arrays improperly compared to NumPy.
dot
xt::xarray<double> a = xt::ones<double>({ 1, 2, 0, 1});
xt::xarray<double> b = xt::ones<double>({ 1, 0 });
auto result = xt::linalg::dot(a, b);
access violation - cblas_ddot
NumPy:
a = numpy.ones([1, 2, 0, 1])
b = numpy.ones([1, 0])
numpy.dot(a, b)
array([], shape=(1, 2, 0, 0), dtype=float64)
solve
auto a = xt::ones<double>({ 0, 0 });
auto b = xt::ones<double>({ 0 });
auto result = xt::linalg::solve(a, b);
C++ runtime assertion failed
File: gesv.tcc
Line: 98
Expression: info>=0
Numpy:
a = numpy.ones([0, 0])
b = numpy.ones([0])
numpy.linalg.solve(a, b)
array([], dtype=float64)
We need to enhance the docs (also in response to #63)
Hi,
When I install xtensor-blas, the content of the directory xtensor-blas/flens
is directly copied into installpath/include/
. It follows that the variable FLENS_INCLUDE_DIR
provided by xtensor-blasConfig.cmake
is not defined.
Furthermore, the project name in the xtensor-blasConfig.cmake
writes xtensor_blas while one could expect to find xtensor-blas instead.
If I correct CMakeLists by appending xtensor-blas/flens
for the install include path and if I use variable ${xtensor_blas_INCLUDE_DIRS}
, I am able to use xtensor-blas as third-party library.
By the way, I really like the work you are doing ;-) !
Cheers,
Thibaud Kloczko
[email protected]
The ger
implementation should proably use get_leading_stride
as well (eventually the other blas functions, too).
https://github.com/QuantStack/xtensor-blas/blob/master/include/xtensor-blas/xblas.hpp#L212-L233
otherwise, we still use the FLENS overloads.
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.