xtensor-stack / xtensor-blas Goto Github PK

View Code? Open in Web Editor NEW

155.0 14.0 50.0 4.31 MB

BLAS extension to xtensor

License: BSD 3-Clause "New" or "Revised" License

Emacs Lisp 0.01% CMake 0.33% C++ 85.80% C 13.74% Makefile 0.01% Perl 0.07% Python 0.05%

blas linear-algebra xtensor lapack c-plus-plus-14

xtensor-blas's People

Contributors

Stargazers

Watchers

xtensor-blas's Issues

OpenBLAS is disabled by default

To enable OpenBLAS, it is necessary to

add a compile flag -DWITH_OPENBLAS
copy cxxstd directory from FLENS repository

With these tips, the following code costs about 0.1 [s] in my computer whereas it costs 16 [s] without them:

template <size_t D> using xtsize = std::array<size_t, D>;

xt::xtensor<double, 2> a(xtsize<2>{2000, 2000}), b(xtsize<2>{2000, 2000});
xt::linalg::dot(a, b);

I would suggest to notice about WITH_OPENBLAS option in README.md with other blas options (such as WITH_MKLBLAS).

Semantics of xtensor's `data()` with views and negative strides

In xtensor, data() returns a raw pointer to a "first element" from which the position of any element can be computed from the strides and indices.

In the case of a view with negative strides, this may not be the first element of the buffer (it can even be the last if all strides are negative).

LAPACK assumes that the pointer passed to it is always the start of the buffer, and will internally compute the "first" element with respect to the striding scheme:

see e.g. here:

138          IF (incx.LT.0) ix = (-n+1)*incx + 1
139          IF (incy.LT.0) iy = (-n+1)*incy + 1

Hence in xtensor-blas, we compute the start of the buffer to pass it to LAPACK which does the inverse transformation. (This fix was added by @wolfv in #103).

To prevent that redundant computation, should we

change the semantics of data() to always point to the start of the buffer, which would invalidate the striding schemes on strided views.
provide another function?
not do anything?

redefining the Default_layout causes xt::linalg::outer to fail

#define XTENSOR_DEFAULT_LAYOUT layout_type::column_major causes a simple outer product to fail with this error meassage:
** On entry to DGER parameter number 9 had an illegal value

Segmentation fault using dot for tensors with one empty dimension

I want to do matrix multiplication for two tensors with shape (1, 2, 2). Using a (2, 2) shape works, but using an array with an empty dimension (shape (1, 2, 2)) gives Process finished with exit code 139 (interrupted by signal 11: SIGSEGV).
I am using xtensor-blas 0.16.1 from conda. I have the following blas library installed:

ii  libopenblas-base:amd64  0.3.5+ds-2  amd64  Optimized BLAS (linear algebra) library (shared library)
ii  libopenblas-dev:amd64   0.3.5+ds-2  amd64  Optimized BLAS (linear algebra) library (development files)

Code example:

#include <xtensor/xtensor.hpp>
#include <xtensor/xio.hpp>
#include "xtensor-blas/xlinalg.hpp"
template <typename T, size_t N>

std::ostream& operator<<(std::ostream& os, std::array<T, N> a) {
    os << "Shape(";
    for (auto& i : a) {
        os << i << " ";
    }
    os << ")";
    return os;
}

using complex = std::complex<double>;

void works() {
    using matrix_t = xt::xtensor<complex, 2>;
    matrix_t A{{std::cos(35), 0}, {0, 1}};
    matrix_t B{{1, -1}, {-1, 1}};
    std::cout << A << std::endl;
    std::cout << B << std::endl;
    std::cout << A.shape() << std::endl;
    std::cout << B.shape() << std::endl;
    auto G = xt::linalg::dot(A, B);
    std::cout << G << std::endl;
}

void fails() {
    using matrix_t = xt::xtensor<complex, 3>;
    matrix_t A{{{std::cos(35), 0}, {0, 1}}};
    matrix_t B{{{1, -1}, {-1, 1}}};
    std::cout << A << std::endl;
    std::cout << B << std::endl;
    std::cout << A.shape() << std::endl;
    std::cout << B.shape() << std::endl;
    auto G = xt::linalg::dot(A, B);
    std::cout << G << std::endl;
}

int main() {
    std::cout << "Working code: " << std::endl;
    works();
    std::cout << "\nSegmentation fault: " << std::endl;
    fails();
}

Output:

Working code: 
{{-0.903692+0.i,  0.      +0.i},
 { 0.      +0.i,  1.      +0.i}}
{{ 1.+0.i, -1.+0.i},
 {-1.+0.i,  1.+0.i}}
Shape(2 2 )
Shape(2 2 )
{{-0.903692+0.i,  0.903692+0.i},
 {-1.      +0.i,  1.      +0.i}}

Segmentation fault: 
{{{-0.903692+0.i,  0.      +0.i},
  { 0.      +0.i,  1.      +0.i}}}
{{{ 1.+0.i, -1.+0.i},
  {-1.+0.i,  1.+0.i}}}
Shape(1 2 2 )
Shape(1 2 2 )

EDIT:
I know using xt::squeeze to remove empty dimensions works. I am still curious why the error occurs.

No matching function for call to 'gesdd' when building with Xcode

I installed xtl, xtensor and xtensor-blas via conda. Versions:

libblas            conda-forge/osx-64::libblas-3.8.0-9_openblas
libcblas           conda-forge/osx-64::libcblas-3.8.0-9_openblas
liblapack          conda-forge/osx-64::liblapack-3.8.0-9_openblas
liblapacke         conda-forge/osx-64::liblapacke-3.8.0-9_openblas
openblas           conda-forge/osx-64::openblas-0.3.6-hd44dcd8_2
xtensor            conda-forge/osx-64::xtensor-0.20.5-h770b8ee_0
xtensor-blas       conda-forge/osx-64::xtensor-blas-0.16.1-h7f1c7fb_0
xtl                conda-forge/osx-64::xtl-0.6.4-h770b8ee_0

I use the following settings to build by Xcode project with xtensor:

Project type: command line tool

Project - Build settings -

Search Paths:

Header search paths: /Users/[username]/anaconda3/include
Library search paths: /Users/[username]/anaconda3/lib

Apple Clang - Language - C++:

C++ language dialect: GNU++14 [-std=gnu++14]
C++ standard library: libc++ (LLVM C++ standard library with C++ 11 support)

Target - Build phases

Link binary with libraries

libopenblasp-r0.3.6.a
libopenblasp-r0.3.6.dylib

(both these libraries are in the directory /Users/[username]/anaconda3/lib)

When building my project, Xcode raises the error No matching function for call to 'gesdd' in the file /Users/[username]/anaconda3/include/xtensor-blas/xlapack.hpp. Is there any solution to this problem?

I'm using Xcode 10.2.1 on macOS Mojave 10.14.4.

`xt::linalg::dot()` with `xt::view` does not work as expected

The following piece of code

#include <iostream>
#include <xtensor/xarray.hpp>
#include <xtensor/xio.hpp>
#include <xtensor/xview.hpp>
#include <xtensor-blas/xlinalg.hpp>

#define SHOW(x) { std::cout << (#x) << " = " << (x) << std::endl; }

int main(int argc, char** argv)
{
  using xt::placeholders::_;
  xt::xarray<double> A = {{2, 3}, {5, 7}, {11, 13}};
  auto A1 = xt::view(A, xt::range(_, 3), 0);
  auto A2 = xt::view(A, xt::range(-1, -3-1, -1), 1);
  xt::xarray<double> B1 = A1;
  xt::xarray<double> B2 = A2;

  SHOW(A);
  SHOW(A1);
  SHOW(A2);
  SHOW(xt::linalg::dot(A1, A2));
  SHOW(B1);
  SHOW(B2);
  SHOW(xt::linalg::dot(B1, B2));

  return 0;
}

produces the following output

A = {{  2.,   3.},
 {  5.,   7.},
 { 11.,  13.}}
A1 = {  2.,   5.,  11.}
A2 = { 13.,   7.,   3.}
xt::linalg::dot(A1, A2) = { 168.}
B1 = {  2.,   5.,  11.}
B2 = { 13.,   7.,   3.}
xt::linalg::dot(B1, B2) = { 94.}

Return types

Currently, xtensor-blas tries to return xtensors if the dimensions of the return type are known. I'm not sure if that's a good idea, and in some cases it changes the interface a bit (e.g. in lstsq the solution is always in 2D which is different from numpy).

Also, people might want to change the dimensionality of the return array.

Should we make the return type selectable, or use xarray?

xblas::gemm cannot take a variable of type xt::pyarray<double> as output

I tried some test code on xblas::gemm, and got some error message. I just wonder if I used xblas::gemm in a correct way. I got the test code from https://github.com/QuantStack/xtensor-blas/blob/master/test/test_blas.cpp, and made some changes.

The original code is shown as below. It can be compiled with no errors.

xt::xarray<double> X = {{1, 2, 3}, {1, 2, 3}};
auto M = xt::xarray<double>::from_shape({3, 3});
xt::blas::gemm(X, X, M, true, false, 1.0, 0.0);

Then, I made a change on the second line of the original code.

xt::xarray<double> X = {{1, 2, 3}, {1, 2, 3}};
auto M = xt::pyarray<double>::from_shape({3, 3});
xt::blas::gemm(X, X, M, true, false, 1.0, 0.0);

When I compiled the new code, I got the error message: "... xtensor-blas/xblas.hpp:176:40: error: static assertion failed: GEMM result layout cannot be dynamic. ..." Could you please explain the error in detail? Thanks a lot.

[linalg] cling xtensor dot product results inconsistent with numpy - `nan`'s

import sklearn
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model, datasets
logistic = linear_model.LogisticRegression()
digits = datasets.load_digits()
X_digits = digits.data.astype(int)
y_digits = digits.target.astype(int)
logistic.fit(X_digits, y_digits)

with open("weights.npy", "ab") as f:
    np.save(f, logistic.coef_)
with open("coefs.npy", "ab") as f:
    np.save(f, logistic.intercept_)
with open("mnist.npy", "ab") as f:
    np.save(f, X_digits)

print(X_digits.dot(logistic.coef_.T))

Yields:

[[ 16.04425395 -24.57188278 -18.99064139 ... -12.53595987  -6.59369383
   -8.84759654]
 [-39.86667809   9.89095239 -21.03906964 ... -29.23834092  -4.68151324
  -19.8681834 ]
 [-24.13828765  -2.48055447   8.55444032 ... -34.4661001    0.46635158
  -26.19119682]
 ...
 [-26.91305732  -5.43773514 -19.43945597 ... -34.21642645   8.02700404
  -21.49129071]
 [ -7.463867    -9.234329   -25.17197869 ... -26.92164295  -1.00534219
   12.71348187]
 [-25.10017597 -11.51306815 -20.65175452 ... -40.86844194   5.43239693
  -10.62263562]]

After deserializing the matrix and doing dot products on it in xtensor:

#include <iostream>
#include <xtensor/xarray.hpp>
#include <xtensor/xio.hpp>
#include <xtensor/xview.hpp>
#include <xtensor/xexpression.hpp>
#include <xtensor/xnpy.hpp>

#include <xtensor-blas/xlinalg.hpp>
xt::xarray<double> x = xt::load_npy<double>("mnist.npy");
xt::xarray<double> w = xt::load_npy<double>("weights.npy");
xt::xarray<double> b = xt::load_npy<double>("coefs.npy");

std::cout << xt::linalg::dot(x, xt::transpose(w));

yields:

{{ 16.044265, -24.571906, -18.990706, ..., -12.53596 ,  -6.593694,  -8.847597},
 {-39.866668,   9.890924, -21.039149, ..., -29.238341,  -4.681513, -19.868183},
 {-24.138276,  -2.480522,   8.554367, ..., -34.4661  ,   0.466352, -26.191197},
 ..., 
 {-26.913044,        nan, -19.439551, ..., -34.216426,   8.027004, -21.491291},
 { -7.463854,  -9.234249, -25.172056, ..., -26.921643,        nan,  12.713482},
 {       nan,        nan, -20.651845, ..., -40.868442,   5.432397,        nan}}

Which has the same floating point values in most, but contains nan in some fields which are likely due to floating point overflow or underflow (can't be divide by 0).

Environment info:

(cling) ❯ brew list | grep blas
openblas

(cling) ❯ conda list | grep xtensor
xtensor                   0.16.4                        0    QuantStack
xtensor-blas              0.11.1          blas_openblash2d50403_0  [blas_openblas]  conda-forge

Using OSX v10.13.6 High Sierra.

LNK2019 error

I have written this simple code:

#include <iostream>
#include <tuple>
#include <complex>
#include "xtensor/xarray.hpp"
#include "xtensor/xio.hpp"
#include "xtensor/xmath.hpp"
#include "xtensor/xview.hpp"
#include "xtensor/xcomplex.hpp"
#include "xtensor/xoperation.hpp"
#include "xtensor-blas/xlinalg.hpp"

using namespace std;
using namespace xt;

xarray<double> nxn_array(unsigned int);

int main()
{
	auto test = nxn_array(3); //array w/ values that go from 1 to 9
	auto test1 = ones_like(test);
	xarray<complex<double>> eig = linalg::eigvals(test1);
        return 0;
}

And it is outputting the linker error:

Error	LNK2019	unresolved external symbol _dgeev_ referenced in function "int __cdecl cxxlapack::geev<int>(char,char,int,double *,int,double *,double *,double *,int,double *,int,double *,int)" (??$geev@H@cxxlapack@@YAHDDHPANH000H0H0H@Z)	FileName C:\...\FileName\FileName\FileName.obj	1

I don't really know anything about linker errors so I'm not really sure how to resolve this issue.

No support for ndimensional norm value in xt::linalg::norm

Travis broken with OpenBLAS

It seems that OpenBLAS is not available on conda-forge, resulting in travis failure.

xblas::gemm taking xscalar<double/...> instead of double for alpha/beta

Hi @JohanMabille / @SylvainCorlay,

I'd like your input on this. Back in the day, I used xscalar<value_type> for alpha and beta as arguments. The reason was, that I was experimenting with some fancy template overload stuff which would automatically detect an xfunction with an xscalar and evaluate to the correct xblas overload if possible (it was experimental).

Now, it's kinda inconvenient to use the API directly (as in #26) as one needs to explicitly use xscalar(123.0) as argument.

Can we add an automatic conversion constructor (such that one can just pass the number and it get's automatically converted into an xscalar)? Or should we change the interface of the xblas functions to take value_type& instead?

Cheers,

Wolf

Requiring xtensor-blas as a CMake dependency fails.

During the install process FLENS is installed to the incorrect directory.

https://github.com/QuantStack/xtensor-blas/blob/b946e47c42d13a8ce4b49b6f2d7f0b67fee257d7/CMakeLists.txt#L102-L103

Installing in a fresh conda env then using CMake's find_package causes a set and check failure.

https://github.com/QuantStack/xtensor-blas/blob/b946e47c42d13a8ce4b49b6f2d7f0b67fee257d7/xtensor-blasConfig.cmake.in#L20

linalg::dot does not work well with view

Here is a simple extension of the dot function. The inputs are assumed to be 3-d numpy ndarrays, and it just returns the dot product of the first slices of the inputs.

inline xt::pyarray<double> dot3d(xt::pyarray<double> &ar0,
                                 xt::pyarray<double> &ar1)

{
  xt::xarray<double> ar0_x(ar0);                                                                                                                                                                
  xt::xarray<double> ar1_x(ar1);                                                                                                                                                                
  auto out = xt::linalg::dot(xt::view(ar0_x, 0),
                             xt::view(ar1_x, 0));
  return xt::pyarray<double> (out);
}

This appears to compile without issues. Here's the output from the interpreter --

temp1 = np.random.randn(50).reshape(2,5,5)
temp2 = np.random.randn(20).reshape(2,5,2)
temp3 = am_ext.dot3d(temp1, temp2)
Traceback (most recent call last):
File "", line 1, in
RuntimeError: No valid layout chosen.
`

Is this an expected behavior?

The context here is, I'm trying to write a function that takes two numpy 3-d arrays and single 1-d index, and apply dot function on the corresponding slices of the inputs based on the index.

xt::linalg::lstsq is not working when M < N

The following minimal test fails

TEST(xlinalg, lstsq_m_lessthan_n_minimal)
{
    xarray<double> arg_0 = {{ 0., 1.}};
    xarray<double> arg_1 = {1.};

    auto res = xt::linalg::lstsq(arg_0, arg_1);
}

The error is "Could not find workspace size for gelsd."

According to http://www.netlib.org/lapack/explore-html/d7/d3b/group__double_g_esolve_ga94bd4a63a6dacf523e25ff617719f752.html#ga94bd4a63a6dacf523e25ff617719f752:
ldb should be >= max(1,max(M,N)).

From
https://github.com/QuantStack/xtensor-blas/blob/9725a505db5c24f4037f9df4691171037e9f7026/include/xtensor-blas/xlapack.hpp#L693-L702
this looks like b and ldb is not handled differently when M < N .

xt::linalg::inv fails for 1x1 matrices

xt::linalg::inv fails for 1x1 matrices, since xt::xarray::strides returns {0,0}. The following produces a runtime error:

#include <xtensor/xarray.hpp>
#include <xtensor-blas/xlinalg.hpp>

int main()
{
  xt::xarray<double> A = xt::zeros<double>({1, 1});
  auto B = xt::linalg::inv(A);
}

** On entry to DGETRF, parameter number  4 had an illegal value
info = -4
Assertion failed: (info>=0), function getrf, file XXX/xflens/cxxlapack/interface/getrf.tcc, line 87.

A lot of unused files from FLENS get installed by cmake (unused cmake lists, lapack bundle)

and other things.

We should prevent this either:

by removing them from the source tree
or by adding exclude pattern for the files type.

`xt::linalg::dot()` is not working with `xarray_adaptor<>` when shape is 2x2

This is shown in the following code example.

#include "xtensor/xadapt.hpp"
#include "xtensor-blas/xlinalg.hpp"

int main()
{
        std::vector<double> vec{0, 1, 2, 3};
        std::vector<size_t> shape{2, 2};

        auto arr = xt::adapt(
                vec,
                shape
        );

        auto arr2 = xt::adapt(
                vec.data(),
                vec.size(),
                xt::no_ownership(),
                shape,
                std::vector<int>{2, 1}
        );

        std::cout << arr << std::endl;
        std::cout << xt::linalg::dot(arr, arr) << std::endl;
        std::cout << arr2 << std::endl;
        std::cout << xt::linalg::dot(arr2, arr2) << std::endl;

        return 0;
}

Output:

{{ 0.,  1.},
 { 2.,  3.}}
{{  2.,   3.},
 {  6.,  11.}}
{{ 0.,  1.},
 { 2.,  3.}}
terminate called after throwing an instance of 'std::runtime_error'
  what():  No valid layout chosen.

This is caused by the dynamic layout_type and get_leading_stride() in https://github.com/QuantStack/xtensor-blas/blob/master/include/xtensor-blas/xblas_utils.hpp#L129.

Linker error with xtensor-blas on xlfens

Hi, I am trying to use xtensor-blas on Windows. I have followed the steps to build it from source using Cmake properly and had no issues with that.
I am using MSVC to compile one of the examples that were given.

When I run: "cl xtens.cpp /I C:\Program1\include"

Output:
Microsoft (R) C/C++ Optimizing Compiler Version 19.16.27027.1 for x86
Copyright (C) Microsoft Corporation. All rights reserved.

xtens.cpp
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\xlocale(319): warning C4530: C++ exception handler used, but unwind semantics are not enabled. Specify /EHsc
Microsoft (R) Incremental Linker Version 14.16.27027.1
Copyright (C) Microsoft Corporation. All rights reserved.

/out:xtens.exe
xtens.obj
xtens.obj : error LNK2019: unresolved external symbol _cblas_ddot referenced in function "void __cdecl cxxblas::dot(int,double const *,int,double const *,int,double &)" (??$dot@H@cxxblas@@YAXHPBNH0HAAN@Z)
xtens.obj : error LNK2019: unresolved external symbol _cblas_dgemv referenced in function "void __cdecl cxxblas::gemv(enum cxxblas::StorageOrder,enum cxxblas::Transpose,int,int,double,double const *,int,double const *,int,double,double *,int)" (??$gemv@H@cxxblas@@YAXW4StorageOrder@0@W4Transpose@0@HHNPBNH2HNPANH@Z)
xtens.obj : error LNK2019: unresolved external symbol _cblas_dgemm referenced in function "void __cdecl cxxblas::gemm(enum cxxblas::StorageOrder,enum cxxblas::Transpose,enum cxxblas::Transpose,int,int,int,double,double const *,int,double const *,int,double,double *,int)" (??$gemm@H@cxxblas@@YAXW4StorageOrder@0@W4Transpose@0@1HHHNPBNH2HNPANH@Z)
xtens.obj : error LNK2019: unresolved external symbol _cblas_dsyrk referenced in function "void __cdecl cxxblas::syrk(enum cxxblas::StorageOrder,enum cxxblas::StorageUpLo,enum cxxblas::Transpose,int,int,double,double const *,int,double,double *,int)" (??$syrk@H@cxxblas@@YAXW4StorageOrder@0@W4StorageUpLo@0@W4Transpose@0@HHNPBNHNPANH@Z)
xtens.exe : fatal error LNK1120: 4 unresolved externals`

Program1\include\ folder contains folders: xtensor,xtensor-blas,xflens,xtl which contain the header fo respective libraries.
How can I solve this problem?

Trouble finding xtensor-blas in CMakeLists.txt

I am hardcoding the xtensor-blas path in my CMakeLists.txt right now, which is obviously not ideal.

project(KALMANPP)

cmake_minimum_required(VERSION 3.1)

# Set flags and params
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_FLAGS_DEBUG "-Wall -g -W")

=====>  include_directories(/usr/local/include/xtensor-blas/)  <=====

find_package(XTENSOR REQUIRED)
find_package(BLAS)
find_package(LAPACK)

add_executable(kalmanpp src/Main.cpp)
target_link_libraries(kalmanpp lapack blas)

I installed xtensor-blas via cmake, identical to the xtensor. All files seem to be installed correctly, /usr/local/include/xtensor-blas/ on my Mac. I'm not an expert with CMake, but looking at your CMakeLists.txt, I can't find any "find_package(XTENSOR_BLAS)" or similar. How can I include it in cmake?

Thanks!

Inconsistent install path for xtensor-blasConfig.cmake with respect to the xtensor family

Hi Guys,

As said in the title, xtensor-blasConfig.cmake is installed by default in share/cmake/xtensor-blas folder while for the rest of the family such files are installed in lib/cmake/family-name.

Of course, this not urgent :-) !

Cheers,

Thibaud.

xt::blas::gemm doesn't work for non-square matrix and transpose_A is true

Hey guys,
I noticed that there's a bug in the gemm function. If you try to calculate M = X'*X in the following way:

xt::xarray<double> X = {{1, 2, 3},
                        {1, 2, 3}};
std::vector<int> shape = {3, 3};
xt::xarray<double> M;
M.reshape(shape);

xt::blas::gemm(X, X, M, xt::xscalar<double>(1), xt::xscalar<double>(0), true, false);

Cheers

Fix norm interface to match xtensor

As per xtensor-stack/xtensor#417

Support multiple dimensions in cross, pinv and solve

Cross
At the moment, cross seems to be limited to 1D vectors of size 2 or 3. Numpy doesn't have the constraint for dimensionality and is capable of computing cross for higher dimensionality arrays if the last dimension has length of 2 or 3.


xt::xarray<double> firstArray = xt::ones<double>({ 3, 4, 2 });
xt::xarray<double> secondArray = xt::ones<double>({ 3, 4, 2 });

auto result = xt::linalg::cross(firstArray, secondArray);
std::runtime_error("a or b did not have appropriate size 2 or 3.");

NumPy:

a = numpy.ones([3, 4, 2])
b = numpy.ones([3, 4, 2])
numpy.cross(a, b)
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])
numpy.cross(a, b).shape
(3, 4)

Pinv

Pinv seems to support only 1D or 2D arrays. NumPy does support higher dimensionality.


xt::xarray<double> a = xt::ones<double>({ 1, 2, 3, 4});

auto result = xt::linalg::pinv(a);

XTENSOR_ASSERT_MSG(N == e.derived_cast().dimension(), "Cannot change dimension of xtensor.");

NumPy:

numpy.linalg.pinv(numpy.ones([1, 2, 3, 4]))
array([[[[0.08333333, 0.08333333, 0.08333333],
         [0.08333333, 0.08333333, 0.08333333],
         [0.08333333, 0.08333333, 0.08333333],
         [0.08333333, 0.08333333, 0.08333333]],

        [[0.08333333, 0.08333333, 0.08333333],
         [0.08333333, 0.08333333, 0.08333333],
         [0.08333333, 0.08333333, 0.08333333],
         [0.08333333, 0.08333333, 0.08333333]]]])


a = numpy.array([[[1, 2, 3]]])
a.shape
(1, 1, 3)
numpy.linalg.pinv(a)
array([[[0.07142857],
        [0.14285714],
        [0.21428571]]])

Solve
Solve seems to be limited to <= 2D by XTENSOR_ASSERT.

auto a = xt::xarray<double>({ {0.72999173, 0.94937934, 0.6160863},
                            {0.67719638, 0.11662088, 0.86027322},
                            {0.22839025, 0.14736096, 0.36511296} });
auto b = xt::xarray<double>({ {{0.42682534, 0.18769713},
                            {0.09106305, 0.93373785},
                            {0.3472225 , 0.05267208}},

                            {{0.09015187, 0.94741843},
                            {0.85322852, 0.25576913},
                            {0.47621316, 0.5724512}} });
auto result = xt::linalg::solve(a, b);
XTENSOR_ASSERT failed.

NumPy:

a = numpy.random.rand(9).reshape(3,3)
b = numpy.random.rand(12).reshape(2, 3, 2)
numpy.linalg.solve(a, b)
array([[[-3.08713549,  3.33567358],
        [ 1.29121029, -1.49942674],
        [ 2.36096848, -1.33713771]],

       [[-1.99462003, -4.35353126],
        [-0.03715958,  2.11458264],
        [ 2.56698863,  3.43769886]]])

Having ability to compute this functions with higher dimensionality arrays would help anyone doing more demanding calculations.

Use calculated size from shape

In order to not run into the issue with xt::pyarray we should use the size computed from shape here:

https://github.com/QuantStack/xtensor-blas/blob/a06ba041309a841fbc27679a61105b95913653d2/include/xtensor-blas/xlinalg.hpp#L67

Using XTENSOR_USE_TBB with xtensor-blas fails compilation

Hi, I am trying to use both xtensor-blas and TBB acceleration at the same time. However, the compilation fails.

I am using the following CMakeLists.txt

cmake_minimum_required(VERSION 3.1)
project(test_blas)

# Try to use only anaconda provided stuff
set(CMAKE_PREFIX_PATH /Users/scheibler/anaconda3/envs/mixiva ${CMAKE_PREFIX_PATH})

find_package(xtl REQUIRED)
find_package(xtensor REQUIRED)

add_executable(test_blas src/test_blas.cpp)

OPTION(XTENSOR_USE_XSIMD "simd acceleration for xtensor" ON)
OPTION(XTENSOR_USE_TBB "enable parallelization using intel TBB" ON)

if(XTENSOR_USE_XSIMD)
    set(xsimd_REQUIRED_VERSION 7.0.0)
    find_package(xsimd ${xsimd_REQUIRED_VERSION} REQUIRED)
    message(STATUS "Found xsimd: ${xsimd_INCLUDE_DIRS}/xsimd")
endif()

if(XTENSOR_USE_TBB)
    set(CMAKE_MODULE_PATH "${CMAKE_MODULE_PATH}" "${CMAKE_CURRENT_SOURCE_DIR}/cmake/")
    find_package(TBB REQUIRED)
    message(STATUS "Found intel TBB: ${TBB_INCLUDE_DIRS}")
endif()

if(MSVC)
    target_compile_options(test_blas PRIVATE /EHsc /MP /bigobj)
    set(CMAKE_EXE_LINKER_FLAGS /MANIFEST:NO)
endif()

if (CMAKE_CXX_COMPILER_ID MATCHES "Clang" OR
    CMAKE_CXX_COMPILER_ID MATCHES "GNU" OR
    (CMAKE_CXX_COMPILER_ID MATCHES "Intel" AND NOT WIN32))
    target_compile_options(test_blas PRIVATE -march=native -std=c++14)
endif()

add_definitions(-DHAVE_CBLAS=1)
add_definitions(-DXTENSOR_USE_TBB)
add_definitions(-DXTENSOR_USE_SIMD)

# set(BLA_VENDOR Intel10_64lp)

if (WIN32)
    find_package(OpenBLAS REQUIRED)
    set(BLAS_LIBRARIES ${CMAKE_INSTALL_PREFIX}${OpenBLAS_LIBRARIES})
else()
    find_package(BLAS REQUIRED)
    find_package(LAPACK REQUIRED)
endif()

message(STATUS "BLAS VENDOR:    " ${BLA_VENDOR})
message(STATUS "BLAS LIBRARIES: " ${BLAS_LIBRARIES})

target_link_libraries(test_blas xtensor ${BLAS_LIBRARIES} tbb)

And the code is

#include <iostream>
#include <xtensor/xtensor.hpp>
#include <xtensor/xrandom.hpp>
#include <xtensor-blas/xlinalg.hpp>

int main()
{
  xt::xtensor<double, 2, xt::layout_type::row_major> x = xt::random::randn<double>({1000, 1000});
  xt::xtensor<double, 2, xt::layout_type::row_major> y = xt::random::randn<double>({1000, 1000});
  auto y_T = xt::transpose(y);

  xt::xtensor<double, 2> z = xt::linalg::dot(x, y_T);

  for (size_t i ; i < 10 ; i++)
    z = xt::linalg::dot(x, y_T);
  std::cout << z(0, 0) << std::endl;
}

The output of compilation is this.

make test_blas
Scanning dependencies of target test_blas
[ 50%] Building CXX object CMakeFiles/test_blas.dir/src/test_blas.cpp.o
In file included from /Users/scheibler/Documents/Research/Projects/Current/Separation/mixiva/test_xtensor/src/test_blas.cpp:5:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xtensor-blas/xlinalg.hpp:29:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xtensor-blas/xlapack.hpp:23:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/cxxlapack.cxx:37:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/cxxlapack.tcc:36:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/interface/interface.tcc:36:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/interface/bbcsd.tcc:39:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/netlib.h:25:
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:973:39: error:
      expected ')'
                    const FLOAT      *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:968:20: note:
      to match this '('
LAPACK_IMPL(cggbak)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:990:39: error:
      expected ')'
                    FLOAT            *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:982:20: note:
      to match this '('
LAPACK_IMPL(cggbal)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:1087:39: error:
      expected ')'
                    FLOAT            *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:1070:20: note:
      to match this '('
LAPACK_IMPL(cggevx)(const char       *BALANC,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6787:39: error:
      expected ')'
                    const DOUBLE     *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6782:20: note:
      to match this '('
LAPACK_IMPL(dggbak)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6804:39: error:
      expected ')'
                    DOUBLE           *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6796:20: note:
      to match this '('
LAPACK_IMPL(dggbal)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6902:39: error:
      expected ')'
                    DOUBLE           *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6884:20: note:
      to match this '('
LAPACK_IMPL(dggevx)(const char       *BALANC,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13109:39: error:
      expected ')'
                    const FLOAT      *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13104:20: note:
      to match this '('
LAPACK_IMPL(sggbak)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13126:39: error:
      expected ')'
                    FLOAT            *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13118:20: note:
      to match this '('
LAPACK_IMPL(sggbal)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13224:39: error:
      expected ')'
                    FLOAT            *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13206:20: note:
      to match this '('
LAPACK_IMPL(sggevx)(const char       *BALANC,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19244:39: error:
      expected ')'
                    const DOUBLE     *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19239:20: note:
      to match this '('
LAPACK_IMPL(zggbak)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19261:39: error:
      expected ')'
                    DOUBLE           *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19253:20: note:
      to match this '('
LAPACK_IMPL(zggbal)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19358:39: error:
      expected ')'
                    DOUBLE           *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19341:20: note:
      to match this '('
LAPACK_IMPL(zggevx)(const char       *BALANC,
                   ^
12 errors generated.
make[3]: *** [CMakeFiles/test_blas.dir/src/test_blas.cpp.o] Error 1
make[2]: *** [CMakeFiles/test_blas.dir/all] Error 2
make[1]: *** [CMakeFiles/test_blas.dir/rule] Error 2
make: *** [test_blas] Error 2

The code is compiled on Mac OS X Mojave (10.14.4) with Apple LLVM

Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

When I add #undef XTENSOR_USE_TBB at the beginning of the code, it compiles and runs as expected.

`find_package(xtensor-blas REQUIRED)` does not expose targets

How should user consume xtensor-blas? I discovered that there is an xtensor_blas_INCLUDE_DIRS that I can use to populate targets in my library via target_include_dirs but I would prefer to simply use target_link_libraries(mylib <LINKAGE> xtensor_blas).

Add some tests with stride=0

xtensor has an optimization where a stride can be 0.
We should add some tests to cover this special stride length, and make sure that the BLAS implementations we're testing against are fine with it.

dot product is slow!

I'm trying to reach the same performance as NumPy. With this code below I evaluate the speed of dot product using xtensor-blas:
benchmark.cpp

#include <iostream>
#include "xtensor/xarray.hpp"
#include "xtensor/xrandom.hpp"
#include "xtensor-blas/xlinalg.hpp"


xt::xtensor<double, 2> my_dot(xt::xtensor<double, 2> &A, xt::xtensor<double, 2> &B){
    auto start = std::chrono::steady_clock::now();

    xt::xtensor<double, 2> C = xt::linalg::dot(A, B);

    auto end = std::chrono::steady_clock::now();
    std::cout << std::chrono::duration<double, std::milli>(end - start) .count()
                << " ms" << std::endl;
    return C;
}


int main(int argc, char *argv[])
{
    for (int i=0;i<3;i++){
        xt::xtensor<double, 2> A = xt::random::randn({1000,2000}, 0., 1.);
        xt::xtensor<double, 2> B = xt::random::randn({2000,2000}, 0., 1.);

        xt::xtensor<double, 2> C = my_dot(A, B);
    }
    return 0;
}

I compile this with this command:

$ g++ benchmark.cpp -O3 -mavx2 -ffast-math \
 -I/home/--user--/install_path/include \
 -lblas -llapack \
 -DHAVE_CBLAS=1 \
 -o a.out

and I get

$ ./a.out
2395.9 ms
2415.98 ms
2461.93 ms

(Install_path/include contains all necessary libraries I.e xtensor, xtensor-blas...)
While when I multiply arrays of the same size in NumPy I get

$ python benchmark.py
76.17 ms
76.05 ms
79.00 ms

So this is mush SLOWER!
I've installed sudo apt-get install libblas-dev liblapack-dev
System:

Ubuntu: 18.04
gcc: 7.4.0
latest versions of xtensor libraries

Could you please help me with this?

xt::linalg multiplication always yields 0 vectors for random input array

When mulitplying two arrays, the result is always vector of 0. (shape agrees, tho).

auto x = xt::random::randn<double>({784});
auto weights = xt::xarray<double> ({1});  // should perform identity

auto res = xt::linalg::outer(x, xt::transpose(weights));
auto res = xt::linalg::tensordot(x, xt::transpose(weights), 0);   // same result, also same without transpose

The output was observed in Jupyter Notebook... however, when compiled and executed in my program, I get even stranger error:

Parameter 8 to routine cblas_dger was incorrect

Quick help appreciated!

EDIT: This seems to be the case only for xt::random::randn. When using xt::arange, the output is as expected.

xflens defining colliding macros

xflens defines a number of macros that have names that can easily collide such as INTEGER or FLOAT.

https://github.com/QuantStack/xtensor-blas/blob/0.16.1/include/xflens/cxxlapack/netlib/interface/lapack.in.h#L3-L33

This is an issue when using e.g. both symengine an xtensor-blas in the same notebook.

get_leading_stride() is too restrictive

I tried to do the following:

xtensor<double, 1> A{1, 2, 3, 4, 5}, 
                   B{0.25, 0.5, 0.25}, 
                   C(B.shape(), 0);

// create a view to A looking like
//    1, 2, 3,
//    2, 3, 4,
//    3, 4, 5
static_shape<std::size_t, 2> shape{3, 3}, strides{1, 1};
auto A_as_matrix = adapt(&A(0), 9, no_ownership(), shape, strides);

// call matrix multiplication
C = linalg::dot(A_as_matrix, B);

but got the error message "No valid layout chosen." from get_leading_stride() (xblas_utils.hpp, line 118). However, when I hard-wired get_leading_stride() to return 1, I got the desired result, so my general idea seems to be valid. How can this be solved?

test_tensordot fails to compile

Current git version tests fail to compile when building test_tensordot.cpp.o.
The message (repeated several times) is:
/usr/include/xtensor/xiterator.hpp:324:86: error: invalid use of incomplete type ‘struct xt::detail::LAYOUT_FORBIDEN_FOR_XITERATOR<(xt::layout_type)0>’
using checking_type = typename detail::LAYOUT_FORBIDEN_FOR_XITERATOR::type;
^
/usr/include/xtensor/xiterator.hpp:271:16: note: declaration of ‘struct xt::detail::LAYOUT_FORBIDEN_FOR_XITERATOR<(xt::layout_type)0>’
struct LAYOUT_FORBIDEN_FOR_XITERATOR;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is with gcc version 8.3.1 20190223 (Red Hat 8.3.1-2) (GCC)

MKL linkage on dot product issues general protection fault

Simple reproducible:

    xt::xarray<float> __im2col = {{ 2.,  3.,  3.,  4.},
                                      { 2.,  1.,  3.,  2.}};
    xt::xarray<float> __f = {{ 2.,  3.,  3.},
                                  { 1.,  4.,  5.},
                                  { 1.,  3.,  5.},
                                  { 2.,  2.,  2.}};
    auto _result = xt::linalg::dot(__im2col, __f);

Gives a stack trace in lldb:

EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
         frame #0: 0x000000010067d894 unit_tests`mkl_blas_avx2_dgemm_kernel_nocopy_TN_b0 + 48724
     unit_tests`mkl_blas_avx2_sgemm_kernel_nocopy_TN_b0:
     ->  0x10067d894 <+48724>: vgatherqpd %ymm6, -0x80(%rcx,%ymm7), %ymm0
         0x10067d89b <+48731>: vmovaps %ymm3, %ymm6
         0x10067d89f <+48735>: vgatherqpd %ymm6, -0x78(%rcx,%ymm7), %ymm1
         0x10067d8a6 <+48742>: vmovaps %ymm3, %ymm6

When I link with cblas instead, it does not fault.

I am using apple clang:

(idp3) ❯ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

It is tested with gcc-8 that this error does NOT occur.

xt::linalg::trace does not support complex arrays

The xt::linalg::trace cannot deal with complex arrays, which is reflected in this source code (line 1315):

template <class T>
    auto trace(const xexpression<T>& M, int offset = 0, int axis1 = 0, int axis2 = 1)
    {
        const auto& dM = M.derived_cast();
        auto d = xt::diagonal(dM, offset, std::size_t(axis1), std::size_t(axis2));
        
        std::size_t dim = d.dimension();
        if (dim == 1)
        {
            return xt::xarray<double>(xt::sum(d)());
        }
        else
        {
            return xt::xarray<double>(xt::sum(d, {dim - 1}));
        }
    }

I tried to modify this part directly by changing xarray<double> to xarray<std::complex<double>>. But after I do this, I get another error from xutils.hpp of xtensor(line 628):

template <class T>
    struct conditional_cast_functor<true, T>
    {
        template <class U>
        inline auto operator()(U&& u) const
        {
            return static_cast<T>(std::forward<U>(u));
        }
    };

Error:

Cannot convert 'const std::__1::complex<double>' to 'double' without a conversion operator

Although there is a workaround (tracing the real and the complex parts separately), this is not convenient. Could you please look into this problem and fix it? Thanks!

Buffer overflow when dotting 2 xarray

Hello!
I found a buffer overflow in xtensor-blas while dotting 2 array trough address sanitizer.

Source:

#include <iostream>

#include <xtensor/xarray.hpp>
#include <xtensor/xio.hpp>
#include <xtensor-blas/xlinalg.hpp>

using namespace std;


int main()
{
	auto a = xt::zeros<float>({4,2});
	auto b = xt::zeros<float>({2,5});
	cout << xt::linalg::dot(a, b) << endl;
	return 0;
}

Compiled with parameters -O3 -fsanitize=address on GCC and clang and running the binary results in the following.

=================================================================
==9741==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x604000000078 at pc 0x564c811eda8b bp 0x7ffccb3e7300 sp 0x7ffccb3e72f8
READ of size 4 at 0x604000000078 thread T0
    #0 0x564c811eda8a in void cxxblas::gemv_generic<int, float, float, float, float, float>(cxxblas::StorageOrder, cxxblas::Transpose, cxxblas::Transpose, int, int, float const&, float const*, int, float const*, int, float const&, float*, int) (/home/marty/文件/xtensor-test/main+0x11ea8a)
    #1 0x564c811f1fe1 in void cxxblas::gemm_generic<int, float, float, float, float, float>(cxxblas::StorageOrder, cxxblas::Transpose, cxxblas::Transpose, int, int, int, float const&, float const*, int, float const*, int, float const&, float*, int) (/home/marty/文件/xtensor-test/main+0x122fe1)
    #2 0x564c811e7814 in auto xt::linalg::dot<xt::xbroadcast<xt::xscalar<float const>, std::array<unsigned long, 2ul> >, xt::xbroadcast<xt::xscalar<float const>, std::array<unsigned long, 2ul> > >(xt::xexpression<xt::xbroadcast<xt::xscalar<float const>, std::array<unsigned long, 2ul> > > const&, xt::xbroadcast<xt::xscalar<float const>, std::array<unsigned long, 2ul> ><xt::xbroadcast<xt::xscalar<float const>, std::array<unsigned long, 2ul> > > const&) (/home/marty/文件/xtensor-test/main+0x118814)
    #3 0x564c811e6201 in main (/home/marty/文件/xtensor-test/main+0x117201)
    #4 0x7f95181a2f49 in __libc_start_main (/usr/lib/libc.so.6+0x20f49)
    #5 0x564c810ea989 in _start (/home/marty/文件/xtensor-test/main+0x1b989)

0x604000000078 is located 0 bytes to the right of 40-byte region [0x604000000050,0x604000000078)
allocated by thread T0 here:
    #0 0x564c811e1c71 in operator new(unsigned long) (/home/marty/文件/xtensor-test/main+0x112c71)
    #1 0x564c811ea010 in void xt::xstrided_container<xt::xtensor_container<xt::uvector<float, std::allocator<float> >, 2ul, (xt::layout_type)255, xt::xtensor_expression_tag> >::reshape<std::array<unsigned long, 2ul> >(std::array<unsigned long, 2ul>&&, bool) (/home/marty/文件/xtensor-test/main+0x11b010)

SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/marty/文件/xtensor-test/main+0x11ea8a) in void cxxblas::gemv_generic<int, float, float, float, float, float>(cxxblas::StorageOrder, cxxblas::Transpose, cxxblas::Transpose, int, int, float const&, float const*, int, float const*, int, float const&, float*, int)
Shadow bytes around the buggy address:
  0x0c087fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c087fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c087fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c087fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c087fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c087fff8000: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00[fa]
  0x0c087fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==9741==ABORTING

No idea if this is a xtensor-blas bug or a BLAS library bug thought.

Many thanks.

non-contiguous array

This is the simplest example that I can construct that still produce this error.

xt::pytensor<double, 2> dot_test(xt::pytensor<double, 3> & ar0,
                                 xt::pytensor<double, 2> & ar1,
                                 xt::pytensor<long, 1> & idx) {
  auto shape0 = ar0.shape();
  auto shape1 = ar1.shape();
  xt::pytensor<double, 2> out({shape0[0], shape0[1]});
  out.fill(NAN);
  for (unsigned j=0; j<len(idx); ++j)
    xt::view(out, j) =
      xt::linalg::dot(xt::view(ar0, j),
                      xt::view(ar1, idx(j)));
  return out;
}

If we give a non-contiguous numpy array as the first argument, it fails.

m1 = np.random.randn(30).reshape(2,3,5)
m2 = np.random.randn(4).reshape(2,2)
xt_tools.dot_test(m1[:,:,2:4], m2, np.arange(2))

Traceback (most recent call last):
File "", line 1, in
RuntimeError: No valid layout chosen.

**As the issue I raised a year ago, replacing the dot product line with the following somehow solves the problem. But this is probably not a good permanent fix.
xt::linalg::dot(xt::eval(xt::view(ar0, j)), xt::eval(xt::view(ar1, idx(j))));

import fail -- Symbol not found: _dgetrf_

I used the cookiecutter to start my own development environment (https://github.com/QuantStack/xtensor-python-cookiecutter). After verifying that it works, I simply added a new function that involves xt::linalg::inv to test out the linear algebra library.

inline xt::pyarray<double> example3(xt::pyarray<double> &m) { return xt::linalg::inv(m); }

And also added the include lines

#include "xtensor-blas/xblas.hpp" #include "xtensor-blas/xlinalg.hpp"

It did compile with the existing setup.py, but upon import I get this error from the interpreter.

ImportError: dlopen(xtensor-example-extension/xtensor_example_extension.cpython-36m-darwin.so, 2): Symbol not found: dgetrf
Referenced from: xtensor-example-extension/xtensor_example_extension.cpython-36m-darwin.so
Expected in: flat namespace

Does anyone know why this is happening? Do I need to add something to setup.py?

crash on destruction of result of dot

Next code crashes on destruction of a3:

    {
        xt::xarray<double> a1{1, 2};
        {
            xt::xarray<double> a2{
                {1, 1, 1},
                {1, 2, 3}
            };
            {
                xt::xarray<double> a3 = xt::linalg::dot(a1, a2);
                std::cout << a3 << std::endl;
            } // crashes here
        }
    }

Also, I expected (3 5 7) as result but the console says {3, 5} (wrong result, wrong shape).
Is this a bug? Or, maybe I'm using xtensor-blas in wrong way, here?

xt::linalg::pinv crashes with both Visual Studio 2015 and Visual Studio 2017

I'm using the xTensor master, with xTL 0.5.4 and OpenBlas (v0.2.14.1) on the Windows 10 x64.

When I tried to use xt::linalg::pinv as in QuanStack/xtensor-blas/test/test_linalg.cpp, line 225 in the debug mode:

xt::xtensor<float, 2> d1 {{1.f, 2.f}}; auto r1 = xt::linalg::pinv(d1);

With Visual Studio 2017, the test crashed right away with the following error:

Message: Invalid parameter detected in function std::array<__int64, 1>::operator [], c:\program files (x86)\microsoft visual studio\2017\enterprise\vc\tools\msvc\14.16.27023\include\array line 184, Expression: “array subscript out of

range”

With Visual Studio 2015, the test passed. However it would crash with the same error as in Visual Studio 2017 when the debugger is attached.

Is it a known issue? Any suggestion?

The norm doesn't support float

Support for float precision is missing

xt::linalg::norm(xt::arange<float>(15), 1); // fails to build
xt::linalg::norm(xt::arange<double>(15), 1); // build ok

Added test to showcase this
#127

xtensor-blas documentation fails to build on RTD

Due to "Command killed due to excessive memory consumption/".

We might be able to get around this by ignoring FLENS headers.

Build fails with xtensor 0.20.4

xt::linalg::inv fails with complex scalar

xt::linalg::inv uses LAPACK's getri, which, in the first call, returns the size of the workspace needed. For matrices with std::complex<double> scalar type, the following line fails:

https://github.com/QuantStack/xtensor-blas/blob/f4100bdba77a29e8032e0cd52d824892161cccb5/include/xtensor-blas/xlapack.hpp#L457

Improve handling of empty arrays in dot and solve

At the moment, at least dot and solve seems to handle empty arrays improperly compared to NumPy.

dot

xt::xarray<double> a = xt::ones<double>({ 1, 2, 0, 1});
xt::xarray<double> b = xt::ones<double>({ 1, 0 });
auto result = xt::linalg::dot(a, b);
access violation - cblas_ddot

NumPy:

a = numpy.ones([1, 2, 0, 1])
b = numpy.ones([1, 0])
numpy.dot(a, b)
array([], shape=(1, 2, 0, 0), dtype=float64)

solve

auto a = xt::ones<double>({ 0, 0 });
auto b = xt::ones<double>({ 0 });
auto result = xt::linalg::solve(a, b);
C++ runtime assertion failed
File: gesv.tcc
Line: 98

Expression: info>=0

Numpy:

a = numpy.ones([0, 0])
b = numpy.ones([0])
numpy.linalg.solve(a, b)
array([], dtype=float64)

Add documentation how to link with LAPACK and BLAS

We need to enhance the docs (also in response to #63)

No installpath/include/xtensor-blas/flens directory after installation

Hi,

When I install xtensor-blas, the content of the directory xtensor-blas/flens is directly copied into installpath/include/ . It follows that the variable FLENS_INCLUDE_DIR provided by xtensor-blasConfig.cmake is not defined.
Furthermore, the project name in the xtensor-blasConfig.cmake writes xtensor_blas while one could expect to find xtensor-blas instead.

If I correct CMakeLists by appending xtensor-blas/flens for the install include path and if I use variable ${xtensor_blas_INCLUDE_DIRS}, I am able to use xtensor-blas as third-party library.

By the way, I really like the work you are doing ;-) !

Cheers,

Thibaud Kloczko
[email protected]

Use get_leading_stride more consistently

The ger implementation should proably use get_leading_stride as well (eventually the other blas functions, too).

https://github.com/QuantStack/xtensor-blas/blob/master/include/xtensor-blas/xblas.hpp#L212-L233

If cling loads libblas, it should also define HAVE_CBLAS

otherwise, we still use the FLENS overloads.

xtensor-stack / xtensor-blas Goto Github PK

xtensor-blas's People

Contributors

Stargazers

Watchers

Forkers

xtensor-blas's Issues

Recommend Projects

Recommend Topics

Recommend Org