Giter VIP home page Giter VIP logo

xtensor-blas's People

Contributors

agoose77 avatar amjames avatar derthorsten avatar drew-parsons avatar emmenlau avatar iamthebot avatar james-d-mitchell avatar johanmabille avatar jonathonl avatar kyungminlee avatar luk036 avatar masariello avatar papadop avatar pdumon avatar potpath avatar przemkovv avatar rjsberry avatar robertodr avatar schra avatar sylvaincorlay avatar tatatupi avatar tdegeus avatar vakokako avatar wolfv avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xtensor-blas's Issues

OpenBLAS is disabled by default

To enable OpenBLAS, it is necessary to

  1. add a compile flag -DWITH_OPENBLAS
  2. copy cxxstd directory from FLENS repository

With these tips, the following code costs about 0.1 [s] in my computer whereas it costs 16 [s] without them:

template <size_t D> using xtsize = std::array<size_t, D>;

xt::xtensor<double, 2> a(xtsize<2>{2000, 2000}), b(xtsize<2>{2000, 2000});
xt::linalg::dot(a, b);

I would suggest to notice about WITH_OPENBLAS option in README.md with other blas options (such as WITH_MKLBLAS).

Semantics of xtensor's `data()` with views and negative strides

In xtensor, data() returns a raw pointer to a "first element" from which the position of any element can be computed from the strides and indices.

In the case of a view with negative strides, this may not be the first element of the buffer (it can even be the last if all strides are negative).

LAPACK assumes that the pointer passed to it is always the start of the buffer, and will internally compute the "first" element with respect to the striding scheme:

see e.g. here:

138          IF (incx.LT.0) ix = (-n+1)*incx + 1
139          IF (incy.LT.0) iy = (-n+1)*incy + 1

Hence in xtensor-blas, we compute the start of the buffer to pass it to LAPACK which does the inverse transformation. (This fix was added by @wolfv in #103).

To prevent that redundant computation, should we

  • change the semantics of data() to always point to the start of the buffer, which would invalidate the striding schemes on strided views.
  • provide another function?
  • not do anything?

Segmentation fault using dot for tensors with one empty dimension

I want to do matrix multiplication for two tensors with shape (1, 2, 2). Using a (2, 2) shape works, but using an array with an empty dimension (shape (1, 2, 2)) gives Process finished with exit code 139 (interrupted by signal 11: SIGSEGV).
I am using xtensor-blas 0.16.1 from conda. I have the following blas library installed:

ii  libopenblas-base:amd64  0.3.5+ds-2  amd64  Optimized BLAS (linear algebra) library (shared library)
ii  libopenblas-dev:amd64   0.3.5+ds-2  amd64  Optimized BLAS (linear algebra) library (development files)

Code example:

#include <xtensor/xtensor.hpp>
#include <xtensor/xio.hpp>
#include "xtensor-blas/xlinalg.hpp"
template <typename T, size_t N>

std::ostream& operator<<(std::ostream& os, std::array<T, N> a) {
    os << "Shape(";
    for (auto& i : a) {
        os << i << " ";
    }
    os << ")";
    return os;
}

using complex = std::complex<double>;

void works() {
    using matrix_t = xt::xtensor<complex, 2>;
    matrix_t A{{std::cos(35), 0}, {0, 1}};
    matrix_t B{{1, -1}, {-1, 1}};
    std::cout << A << std::endl;
    std::cout << B << std::endl;
    std::cout << A.shape() << std::endl;
    std::cout << B.shape() << std::endl;
    auto G = xt::linalg::dot(A, B);
    std::cout << G << std::endl;
}

void fails() {
    using matrix_t = xt::xtensor<complex, 3>;
    matrix_t A{{{std::cos(35), 0}, {0, 1}}};
    matrix_t B{{{1, -1}, {-1, 1}}};
    std::cout << A << std::endl;
    std::cout << B << std::endl;
    std::cout << A.shape() << std::endl;
    std::cout << B.shape() << std::endl;
    auto G = xt::linalg::dot(A, B);
    std::cout << G << std::endl;
}

int main() {
    std::cout << "Working code: " << std::endl;
    works();
    std::cout << "\nSegmentation fault: " << std::endl;
    fails();
}

Output:

Working code: 
{{-0.903692+0.i,  0.      +0.i},
 { 0.      +0.i,  1.      +0.i}}
{{ 1.+0.i, -1.+0.i},
 {-1.+0.i,  1.+0.i}}
Shape(2 2 )
Shape(2 2 )
{{-0.903692+0.i,  0.903692+0.i},
 {-1.      +0.i,  1.      +0.i}}

Segmentation fault: 
{{{-0.903692+0.i,  0.      +0.i},
  { 0.      +0.i,  1.      +0.i}}}
{{{ 1.+0.i, -1.+0.i},
  {-1.+0.i,  1.+0.i}}}
Shape(1 2 2 )
Shape(1 2 2 )

EDIT:
I know using xt::squeeze to remove empty dimensions works. I am still curious why the error occurs.

No matching function for call to 'gesdd' when building with Xcode

I installed xtl, xtensor and xtensor-blas via conda. Versions:

libblas            conda-forge/osx-64::libblas-3.8.0-9_openblas
libcblas           conda-forge/osx-64::libcblas-3.8.0-9_openblas
liblapack          conda-forge/osx-64::liblapack-3.8.0-9_openblas
liblapacke         conda-forge/osx-64::liblapacke-3.8.0-9_openblas
openblas           conda-forge/osx-64::openblas-0.3.6-hd44dcd8_2
xtensor            conda-forge/osx-64::xtensor-0.20.5-h770b8ee_0
xtensor-blas       conda-forge/osx-64::xtensor-blas-0.16.1-h7f1c7fb_0
xtl                conda-forge/osx-64::xtl-0.6.4-h770b8ee_0

I use the following settings to build by Xcode project with xtensor:

Project type: command line tool

  1. Project - Build settings -

Search Paths:

  • Header search paths: /Users/[username]/anaconda3/include
  • Library search paths: /Users/[username]/anaconda3/lib

Apple Clang - Language - C++:

  • C++ language dialect: GNU++14 [-std=gnu++14]
  • C++ standard library: libc++ (LLVM C++ standard library with C++ 11 support)
  1. Target - Build phases

Link binary with libraries

  • libopenblasp-r0.3.6.a
  • libopenblasp-r0.3.6.dylib

(both these libraries are in the directory /Users/[username]/anaconda3/lib)

When building my project, Xcode raises the error No matching function for call to 'gesdd' in the file /Users/[username]/anaconda3/include/xtensor-blas/xlapack.hpp. Is there any solution to this problem?

I'm using Xcode 10.2.1 on macOS Mojave 10.14.4.

`xt::linalg::dot()` with `xt::view` does not work as expected

The following piece of code

#include <iostream>
#include <xtensor/xarray.hpp>
#include <xtensor/xio.hpp>
#include <xtensor/xview.hpp>
#include <xtensor-blas/xlinalg.hpp>

#define SHOW(x) { std::cout << (#x) << " = " << (x) << std::endl; }

int main(int argc, char** argv)
{
  using xt::placeholders::_;
  xt::xarray<double> A = {{2, 3}, {5, 7}, {11, 13}};
  auto A1 = xt::view(A, xt::range(_, 3), 0);
  auto A2 = xt::view(A, xt::range(-1, -3-1, -1), 1);
  xt::xarray<double> B1 = A1;
  xt::xarray<double> B2 = A2;

  SHOW(A);
  SHOW(A1);
  SHOW(A2);
  SHOW(xt::linalg::dot(A1, A2));
  SHOW(B1);
  SHOW(B2);
  SHOW(xt::linalg::dot(B1, B2));

  return 0;
}

produces the following output

A = {{  2.,   3.},
 {  5.,   7.},
 { 11.,  13.}}
A1 = {  2.,   5.,  11.}
A2 = { 13.,   7.,   3.}
xt::linalg::dot(A1, A2) = { 168.}
B1 = {  2.,   5.,  11.}
B2 = { 13.,   7.,   3.}
xt::linalg::dot(B1, B2) = { 94.}

Return types

Currently, xtensor-blas tries to return xtensors if the dimensions of the return type are known. I'm not sure if that's a good idea, and in some cases it changes the interface a bit (e.g. in lstsq the solution is always in 2D which is different from numpy).

Also, people might want to change the dimensionality of the return array.

Should we make the return type selectable, or use xarray?

xblas::gemm cannot take a variable of type xt::pyarray<double> as output

I tried some test code on xblas::gemm, and got some error message. I just wonder if I used xblas::gemm in a correct way. I got the test code from https://github.com/QuantStack/xtensor-blas/blob/master/test/test_blas.cpp, and made some changes.

The original code is shown as below. It can be compiled with no errors.

xt::xarray<double> X = {{1, 2, 3}, {1, 2, 3}};
auto M = xt::xarray<double>::from_shape({3, 3});
xt::blas::gemm(X, X, M, true, false, 1.0, 0.0);

Then, I made a change on the second line of the original code.

xt::xarray<double> X = {{1, 2, 3}, {1, 2, 3}};
auto M = xt::pyarray<double>::from_shape({3, 3});
xt::blas::gemm(X, X, M, true, false, 1.0, 0.0);

When I compiled the new code, I got the error message: "... xtensor-blas/xblas.hpp:176:40: error: static assertion failed: GEMM result layout cannot be dynamic. ..." Could you please explain the error in detail? Thanks a lot.

[linalg] cling xtensor dot product results inconsistent with numpy - `nan`'s

import sklearn
import numpy as np
import matplotlib.pyplot as plt
from sklearn import linear_model, datasets
logistic = linear_model.LogisticRegression()
digits = datasets.load_digits()
X_digits = digits.data.astype(int)
y_digits = digits.target.astype(int)
logistic.fit(X_digits, y_digits)

with open("weights.npy", "ab") as f:
    np.save(f, logistic.coef_)
with open("coefs.npy", "ab") as f:
    np.save(f, logistic.intercept_)
with open("mnist.npy", "ab") as f:
    np.save(f, X_digits)

print(X_digits.dot(logistic.coef_.T))

Yields:

[[ 16.04425395 -24.57188278 -18.99064139 ... -12.53595987  -6.59369383
   -8.84759654]
 [-39.86667809   9.89095239 -21.03906964 ... -29.23834092  -4.68151324
  -19.8681834 ]
 [-24.13828765  -2.48055447   8.55444032 ... -34.4661001    0.46635158
  -26.19119682]
 ...
 [-26.91305732  -5.43773514 -19.43945597 ... -34.21642645   8.02700404
  -21.49129071]
 [ -7.463867    -9.234329   -25.17197869 ... -26.92164295  -1.00534219
   12.71348187]
 [-25.10017597 -11.51306815 -20.65175452 ... -40.86844194   5.43239693
  -10.62263562]]

After deserializing the matrix and doing dot products on it in xtensor:

#include <iostream>
#include <xtensor/xarray.hpp>
#include <xtensor/xio.hpp>
#include <xtensor/xview.hpp>
#include <xtensor/xexpression.hpp>
#include <xtensor/xnpy.hpp>

#include <xtensor-blas/xlinalg.hpp>
xt::xarray<double> x = xt::load_npy<double>("mnist.npy");
xt::xarray<double> w = xt::load_npy<double>("weights.npy");
xt::xarray<double> b = xt::load_npy<double>("coefs.npy");

std::cout << xt::linalg::dot(x, xt::transpose(w));

yields:

{{ 16.044265, -24.571906, -18.990706, ..., -12.53596 ,  -6.593694,  -8.847597},
 {-39.866668,   9.890924, -21.039149, ..., -29.238341,  -4.681513, -19.868183},
 {-24.138276,  -2.480522,   8.554367, ..., -34.4661  ,   0.466352, -26.191197},
 ..., 
 {-26.913044,        nan, -19.439551, ..., -34.216426,   8.027004, -21.491291},
 { -7.463854,  -9.234249, -25.172056, ..., -26.921643,        nan,  12.713482},
 {       nan,        nan, -20.651845, ..., -40.868442,   5.432397,        nan}}

Which has the same floating point values in most, but contains nan in some fields which are likely due to floating point overflow or underflow (can't be divide by 0).

Environment info:

(cling) ❯ brew list | grep blas
openblas
(cling) ❯ conda list | grep xtensor
xtensor                   0.16.4                        0    QuantStack
xtensor-blas              0.11.1          blas_openblash2d50403_0  [blas_openblas]  conda-forge

Using OSX v10.13.6 High Sierra.

LNK2019 error

I have written this simple code:

#include <iostream>
#include <tuple>
#include <complex>
#include "xtensor/xarray.hpp"
#include "xtensor/xio.hpp"
#include "xtensor/xmath.hpp"
#include "xtensor/xview.hpp"
#include "xtensor/xcomplex.hpp"
#include "xtensor/xoperation.hpp"
#include "xtensor-blas/xlinalg.hpp"

using namespace std;
using namespace xt;

xarray<double> nxn_array(unsigned int);

int main()
{
	auto test = nxn_array(3); //array w/ values that go from 1 to 9
	auto test1 = ones_like(test);
	xarray<complex<double>> eig = linalg::eigvals(test1);
        return 0;
}

And it is outputting the linker error:

Error	LNK2019	unresolved external symbol _dgeev_ referenced in function "int __cdecl cxxlapack::geev<int>(char,char,int,double *,int,double *,double *,double *,int,double *,int,double *,int)" (??$geev@H@cxxlapack@@YAHDDHPANH000H0H0H@Z)	FileName C:\...\FileName\FileName\FileName.obj	1	

I don't really know anything about linker errors so I'm not really sure how to resolve this issue.

xblas::gemm taking xscalar<double/...> instead of double for alpha/beta

Hi @JohanMabille / @SylvainCorlay,

I'd like your input on this. Back in the day, I used xscalar<value_type> for alpha and beta as arguments. The reason was, that I was experimenting with some fancy template overload stuff which would automatically detect an xfunction with an xscalar and evaluate to the correct xblas overload if possible (it was experimental).

Now, it's kinda inconvenient to use the API directly (as in #26) as one needs to explicitly use xscalar(123.0) as argument.

Can we add an automatic conversion constructor (such that one can just pass the number and it get's automatically converted into an xscalar)? Or should we change the interface of the xblas functions to take value_type& instead?

Cheers,

Wolf

linalg::dot does not work well with view

Here is a simple extension of the dot function. The inputs are assumed to be 3-d numpy ndarrays, and it just returns the dot product of the first slices of the inputs.

inline xt::pyarray<double> dot3d(xt::pyarray<double> &ar0,
                                 xt::pyarray<double> &ar1)

{
  xt::xarray<double> ar0_x(ar0);                                                                                                                                                                
  xt::xarray<double> ar1_x(ar1);                                                                                                                                                                
  auto out = xt::linalg::dot(xt::view(ar0_x, 0),
                             xt::view(ar1_x, 0));
  return xt::pyarray<double> (out);
}

This appears to compile without issues. Here's the output from the interpreter --

`

temp1 = np.random.randn(50).reshape(2,5,5)
temp2 = np.random.randn(20).reshape(2,5,2)
temp3 = am_ext.dot3d(temp1, temp2)
Traceback (most recent call last):
File "", line 1, in
RuntimeError: No valid layout chosen.
`

Is this an expected behavior?

The context here is, I'm trying to write a function that takes two numpy 3-d arrays and single 1-d index, and apply dot function on the corresponding slices of the inputs based on the index.

xt::linalg::lstsq is not working when M < N

The following minimal test fails

TEST(xlinalg, lstsq_m_lessthan_n_minimal)
{
    xarray<double> arg_0 = {{ 0., 1.}};
    xarray<double> arg_1 = {1.};

    auto res = xt::linalg::lstsq(arg_0, arg_1);
}

The error is "Could not find workspace size for gelsd."

According to http://www.netlib.org/lapack/explore-html/d7/d3b/group__double_g_esolve_ga94bd4a63a6dacf523e25ff617719f752.html#ga94bd4a63a6dacf523e25ff617719f752:
ldb should be >= max(1,max(M,N)).

From
https://github.com/QuantStack/xtensor-blas/blob/9725a505db5c24f4037f9df4691171037e9f7026/include/xtensor-blas/xlapack.hpp#L693-L702
this looks like b and ldb is not handled differently when M < N .

xt::linalg::inv fails for 1x1 matrices

xt::linalg::inv fails for 1x1 matrices, since xt::xarray::strides returns {0,0}. The following produces a runtime error:

#include <xtensor/xarray.hpp>
#include <xtensor-blas/xlinalg.hpp>

int main()
{
  xt::xarray<double> A = xt::zeros<double>({1, 1});
  auto B = xt::linalg::inv(A);
}
** On entry to DGETRF, parameter number  4 had an illegal value
info = -4
Assertion failed: (info>=0), function getrf, file XXX/xflens/cxxlapack/interface/getrf.tcc, line 87.

`xt::linalg::dot()` is not working with `xarray_adaptor<>` when shape is 2x2

This is shown in the following code example.

#include "xtensor/xadapt.hpp"
#include "xtensor-blas/xlinalg.hpp"

int main()
{
        std::vector<double> vec{0, 1, 2, 3};
        std::vector<size_t> shape{2, 2};

        auto arr = xt::adapt(
                vec,
                shape
        );

        auto arr2 = xt::adapt(
                vec.data(),
                vec.size(),
                xt::no_ownership(),
                shape,
                std::vector<int>{2, 1}
        );

        std::cout << arr << std::endl;
        std::cout << xt::linalg::dot(arr, arr) << std::endl;
        std::cout << arr2 << std::endl;
        std::cout << xt::linalg::dot(arr2, arr2) << std::endl;

        return 0;
}

Output:

{{ 0.,  1.},
 { 2.,  3.}}
{{  2.,   3.},
 {  6.,  11.}}
{{ 0.,  1.},
 { 2.,  3.}}
terminate called after throwing an instance of 'std::runtime_error'
  what():  No valid layout chosen.

This is caused by the dynamic layout_type and get_leading_stride() in https://github.com/QuantStack/xtensor-blas/blob/master/include/xtensor-blas/xblas_utils.hpp#L129.

Linker error with xtensor-blas on xlfens

Hi, I am trying to use xtensor-blas on Windows. I have followed the steps to build it from source using Cmake properly and had no issues with that.
I am using MSVC to compile one of the examples that were given.

When I run: "cl xtens.cpp /I C:\Program1\include"

Output:
Microsoft (R) C/C++ Optimizing Compiler Version 19.16.27027.1 for x86
Copyright (C) Microsoft Corporation. All rights reserved.

xtens.cpp
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\VC\Tools\MSVC\14.16.27023\include\xlocale(319): warning C4530: C++ exception handler used, but unwind semantics are not enabled. Specify /EHsc
Microsoft (R) Incremental Linker Version 14.16.27027.1
Copyright (C) Microsoft Corporation. All rights reserved.

/out:xtens.exe
xtens.obj
xtens.obj : error LNK2019: unresolved external symbol _cblas_ddot referenced in function "void __cdecl cxxblas::dot(int,double const *,int,double const *,int,double &)" (??$dot@H@cxxblas@@YAXHPBNH0HAAN@Z)
xtens.obj : error LNK2019: unresolved external symbol _cblas_dgemv referenced in function "void __cdecl cxxblas::gemv(enum cxxblas::StorageOrder,enum cxxblas::Transpose,int,int,double,double const *,int,double const *,int,double,double *,int)" (??$gemv@H@cxxblas@@YAXW4StorageOrder@0@W4Transpose@0@HHNPBNH2HNPANH@Z)
xtens.obj : error LNK2019: unresolved external symbol _cblas_dgemm referenced in function "void __cdecl cxxblas::gemm(enum cxxblas::StorageOrder,enum cxxblas::Transpose,enum cxxblas::Transpose,int,int,int,double,double const *,int,double const *,int,double,double *,int)" (??$gemm@H@cxxblas@@YAXW4StorageOrder@0@W4Transpose@0@1HHHNPBNH2HNPANH@Z)
xtens.obj : error LNK2019: unresolved external symbol _cblas_dsyrk referenced in function "void __cdecl cxxblas::syrk(enum cxxblas::StorageOrder,enum cxxblas::StorageUpLo,enum cxxblas::Transpose,int,int,double,double const *,int,double,double *,int)" (??$syrk@H@cxxblas@@YAXW4StorageOrder@0@W4StorageUpLo@0@W4Transpose@0@HHNPBNHNPANH@Z)
xtens.exe : fatal error LNK1120: 4 unresolved externals`

Program1\include\ folder contains folders: xtensor,xtensor-blas,xflens,xtl which contain the header fo respective libraries.
How can I solve this problem?

Trouble finding xtensor-blas in CMakeLists.txt

I am hardcoding the xtensor-blas path in my CMakeLists.txt right now, which is obviously not ideal.

project(KALMANPP)

cmake_minimum_required(VERSION 3.1)

# Set flags and params
set(CMAKE_CXX_STANDARD 14)
set(CMAKE_CXX_FLAGS_DEBUG "-Wall -g -W")

=====>  include_directories(/usr/local/include/xtensor-blas/)  <=====

find_package(XTENSOR REQUIRED)
find_package(BLAS)
find_package(LAPACK)

add_executable(kalmanpp src/Main.cpp)
target_link_libraries(kalmanpp lapack blas)

I installed xtensor-blas via cmake, identical to the xtensor. All files seem to be installed correctly, /usr/local/include/xtensor-blas/ on my Mac. I'm not an expert with CMake, but looking at your CMakeLists.txt, I can't find any "find_package(XTENSOR_BLAS)" or similar. How can I include it in cmake?

Thanks!

xt::blas::gemm doesn't work for non-square matrix and transpose_A is true

Hey guys,
I noticed that there's a bug in the gemm function. If you try to calculate M = X'*X in the following way:

xt::xarray<double> X = {{1, 2, 3},
                        {1, 2, 3}};
std::vector<int> shape = {3, 3};
xt::xarray<double> M;
M.reshape(shape);

xt::blas::gemm(X, X, M, xt::xscalar<double>(1), xt::xscalar<double>(0), true, false);

Cheers

Support multiple dimensions in cross, pinv and solve

Cross
At the moment, cross seems to be limited to 1D vectors of size 2 or 3. Numpy doesn't have the constraint for dimensionality and is capable of computing cross for higher dimensionality arrays if the last dimension has length of 2 or 3.


xt::xarray<double> firstArray = xt::ones<double>({ 3, 4, 2 });
xt::xarray<double> secondArray = xt::ones<double>({ 3, 4, 2 });

auto result = xt::linalg::cross(firstArray, secondArray);
std::runtime_error("a or b did not have appropriate size 2 or 3.");

NumPy:

a = numpy.ones([3, 4, 2])
b = numpy.ones([3, 4, 2])
numpy.cross(a, b)
array([[0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])
numpy.cross(a, b).shape
(3, 4)

Pinv

Pinv seems to support only 1D or 2D arrays. NumPy does support higher dimensionality.


xt::xarray<double> a = xt::ones<double>({ 1, 2, 3, 4});

auto result = xt::linalg::pinv(a);

XTENSOR_ASSERT_MSG(N == e.derived_cast().dimension(), "Cannot change dimension of xtensor.");

NumPy:

numpy.linalg.pinv(numpy.ones([1, 2, 3, 4]))
array([[[[0.08333333, 0.08333333, 0.08333333],
         [0.08333333, 0.08333333, 0.08333333],
         [0.08333333, 0.08333333, 0.08333333],
         [0.08333333, 0.08333333, 0.08333333]],

        [[0.08333333, 0.08333333, 0.08333333],
         [0.08333333, 0.08333333, 0.08333333],
         [0.08333333, 0.08333333, 0.08333333],
         [0.08333333, 0.08333333, 0.08333333]]]])

a = numpy.array([[[1, 2, 3]]])
a.shape
(1, 1, 3)
numpy.linalg.pinv(a)
array([[[0.07142857],
        [0.14285714],
        [0.21428571]]])

Solve
Solve seems to be limited to <= 2D by XTENSOR_ASSERT.

auto a = xt::xarray<double>({ {0.72999173, 0.94937934, 0.6160863},
                            {0.67719638, 0.11662088, 0.86027322},
                            {0.22839025, 0.14736096, 0.36511296} });
auto b = xt::xarray<double>({ {{0.42682534, 0.18769713},
                            {0.09106305, 0.93373785},
                            {0.3472225 , 0.05267208}},

                            {{0.09015187, 0.94741843},
                            {0.85322852, 0.25576913},
                            {0.47621316, 0.5724512}} });
auto result = xt::linalg::solve(a, b);
XTENSOR_ASSERT failed.

NumPy:

a = numpy.random.rand(9).reshape(3,3)
b = numpy.random.rand(12).reshape(2, 3, 2)
numpy.linalg.solve(a, b)
array([[[-3.08713549,  3.33567358],
        [ 1.29121029, -1.49942674],
        [ 2.36096848, -1.33713771]],

       [[-1.99462003, -4.35353126],
        [-0.03715958,  2.11458264],
        [ 2.56698863,  3.43769886]]])

Having ability to compute this functions with higher dimensionality arrays would help anyone doing more demanding calculations.

Using XTENSOR_USE_TBB with xtensor-blas fails compilation

Hi, I am trying to use both xtensor-blas and TBB acceleration at the same time. However, the compilation fails.

I am using the following CMakeLists.txt

cmake_minimum_required(VERSION 3.1)
project(test_blas)

# Try to use only anaconda provided stuff
set(CMAKE_PREFIX_PATH /Users/scheibler/anaconda3/envs/mixiva ${CMAKE_PREFIX_PATH})

find_package(xtl REQUIRED)
find_package(xtensor REQUIRED)

add_executable(test_blas src/test_blas.cpp)

OPTION(XTENSOR_USE_XSIMD "simd acceleration for xtensor" ON)
OPTION(XTENSOR_USE_TBB "enable parallelization using intel TBB" ON)

if(XTENSOR_USE_XSIMD)
    set(xsimd_REQUIRED_VERSION 7.0.0)
    find_package(xsimd ${xsimd_REQUIRED_VERSION} REQUIRED)
    message(STATUS "Found xsimd: ${xsimd_INCLUDE_DIRS}/xsimd")
endif()

if(XTENSOR_USE_TBB)
    set(CMAKE_MODULE_PATH "${CMAKE_MODULE_PATH}" "${CMAKE_CURRENT_SOURCE_DIR}/cmake/")
    find_package(TBB REQUIRED)
    message(STATUS "Found intel TBB: ${TBB_INCLUDE_DIRS}")
endif()

if(MSVC)
    target_compile_options(test_blas PRIVATE /EHsc /MP /bigobj)
    set(CMAKE_EXE_LINKER_FLAGS /MANIFEST:NO)
endif()

if (CMAKE_CXX_COMPILER_ID MATCHES "Clang" OR
    CMAKE_CXX_COMPILER_ID MATCHES "GNU" OR
    (CMAKE_CXX_COMPILER_ID MATCHES "Intel" AND NOT WIN32))
    target_compile_options(test_blas PRIVATE -march=native -std=c++14)
endif()

add_definitions(-DHAVE_CBLAS=1)
add_definitions(-DXTENSOR_USE_TBB)
add_definitions(-DXTENSOR_USE_SIMD)

# set(BLA_VENDOR Intel10_64lp)

if (WIN32)
    find_package(OpenBLAS REQUIRED)
    set(BLAS_LIBRARIES ${CMAKE_INSTALL_PREFIX}${OpenBLAS_LIBRARIES})
else()
    find_package(BLAS REQUIRED)
    find_package(LAPACK REQUIRED)
endif()

message(STATUS "BLAS VENDOR:    " ${BLA_VENDOR})
message(STATUS "BLAS LIBRARIES: " ${BLAS_LIBRARIES})

target_link_libraries(test_blas xtensor ${BLAS_LIBRARIES} tbb)

And the code is

#include <iostream>
#include <xtensor/xtensor.hpp>
#include <xtensor/xrandom.hpp>
#include <xtensor-blas/xlinalg.hpp>

int main()
{
  xt::xtensor<double, 2, xt::layout_type::row_major> x = xt::random::randn<double>({1000, 1000});
  xt::xtensor<double, 2, xt::layout_type::row_major> y = xt::random::randn<double>({1000, 1000});
  auto y_T = xt::transpose(y);

  xt::xtensor<double, 2> z = xt::linalg::dot(x, y_T);

  for (size_t i ; i < 10 ; i++)
    z = xt::linalg::dot(x, y_T);
  std::cout << z(0, 0) << std::endl;
}

The output of compilation is this.

make test_blas
Scanning dependencies of target test_blas
[ 50%] Building CXX object CMakeFiles/test_blas.dir/src/test_blas.cpp.o
In file included from /Users/scheibler/Documents/Research/Projects/Current/Separation/mixiva/test_xtensor/src/test_blas.cpp:5:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xtensor-blas/xlinalg.hpp:29:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xtensor-blas/xlapack.hpp:23:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/cxxlapack.cxx:37:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/cxxlapack.tcc:36:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/interface/interface.tcc:36:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/interface/bbcsd.tcc:39:
In file included from /Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/netlib.h:25:
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:973:39: error:
      expected ')'
                    const FLOAT      *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:968:20: note:
      to match this '('
LAPACK_IMPL(cggbak)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:990:39: error:
      expected ')'
                    FLOAT            *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:982:20: note:
      to match this '('
LAPACK_IMPL(cggbal)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:1087:39: error:
      expected ')'
                    FLOAT            *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:1070:20: note:
      to match this '('
LAPACK_IMPL(cggevx)(const char       *BALANC,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6787:39: error:
      expected ')'
                    const DOUBLE     *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6782:20: note:
      to match this '('
LAPACK_IMPL(dggbak)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6804:39: error:
      expected ')'
                    DOUBLE           *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6796:20: note:
      to match this '('
LAPACK_IMPL(dggbal)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6902:39: error:
      expected ')'
                    DOUBLE           *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:6884:20: note:
      to match this '('
LAPACK_IMPL(dggevx)(const char       *BALANC,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13109:39: error:
      expected ')'
                    const FLOAT      *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13104:20: note:
      to match this '('
LAPACK_IMPL(sggbak)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13126:39: error:
      expected ')'
                    FLOAT            *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13118:20: note:
      to match this '('
LAPACK_IMPL(sggbal)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13224:39: error:
      expected ')'
                    FLOAT            *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:13206:20: note:
      to match this '('
LAPACK_IMPL(sggevx)(const char       *BALANC,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19244:39: error:
      expected ')'
                    const DOUBLE     *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19239:20: note:
      to match this '('
LAPACK_IMPL(zggbak)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19261:39: error:
      expected ')'
                    DOUBLE           *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19253:20: note:
      to match this '('
LAPACK_IMPL(zggbal)(const char       *JOB,
                   ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19358:39: error:
      expected ')'
                    DOUBLE           *LSCALE,
                                      ^
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/sys/sysctl.h:542:17: note:
      expanded from macro 'LSCALE'
#define LSCALE  1000            /* scaling for "fixed point" arithmetic */
                ^
/Users/scheibler/anaconda3/envs/mixiva/include/xflens/cxxlapack/netlib/interface/lapack.in.h:19341:20: note:
      to match this '('
LAPACK_IMPL(zggevx)(const char       *BALANC,
                   ^
12 errors generated.
make[3]: *** [CMakeFiles/test_blas.dir/src/test_blas.cpp.o] Error 1
make[2]: *** [CMakeFiles/test_blas.dir/all] Error 2
make[1]: *** [CMakeFiles/test_blas.dir/rule] Error 2
make: *** [test_blas] Error 2

The code is compiled on Mac OS X Mojave (10.14.4) with Apple LLVM

Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX10.14.sdk/usr/include/c++/4.2.1
Apple LLVM version 10.0.1 (clang-1001.0.46.4)
Target: x86_64-apple-darwin18.5.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

When I add #undef XTENSOR_USE_TBB at the beginning of the code, it compiles and runs as expected.

`find_package(xtensor-blas REQUIRED)` does not expose targets

How should user consume xtensor-blas? I discovered that there is an xtensor_blas_INCLUDE_DIRS that I can use to populate targets in my library via target_include_dirs but I would prefer to simply use target_link_libraries(mylib <LINKAGE> xtensor_blas).

Add some tests with stride=0

xtensor has an optimization where a stride can be 0.
We should add some tests to cover this special stride length, and make sure that the BLAS implementations we're testing against are fine with it.

dot product is slow!

I'm trying to reach the same performance as NumPy. With this code below I evaluate the speed of dot product using xtensor-blas:
benchmark.cpp

#include <iostream>
#include "xtensor/xarray.hpp"
#include "xtensor/xrandom.hpp"
#include "xtensor-blas/xlinalg.hpp"


xt::xtensor<double, 2> my_dot(xt::xtensor<double, 2> &A, xt::xtensor<double, 2> &B){
    auto start = std::chrono::steady_clock::now();

    xt::xtensor<double, 2> C = xt::linalg::dot(A, B);

    auto end = std::chrono::steady_clock::now();
    std::cout << std::chrono::duration<double, std::milli>(end - start) .count()
                << " ms" << std::endl;
    return C;
}


int main(int argc, char *argv[])
{
    for (int i=0;i<3;i++){
        xt::xtensor<double, 2> A = xt::random::randn({1000,2000}, 0., 1.);
        xt::xtensor<double, 2> B = xt::random::randn({2000,2000}, 0., 1.);

        xt::xtensor<double, 2> C = my_dot(A, B);
    }
    return 0;
}

I compile this with this command:

$ g++ benchmark.cpp -O3 -mavx2 -ffast-math \
 -I/home/--user--/install_path/include \
 -lblas -llapack \
 -DHAVE_CBLAS=1 \
 -o a.out

and I get

$ ./a.out
2395.9 ms
2415.98 ms
2461.93 ms

(Install_path/include contains all necessary libraries I.e xtensor, xtensor-blas...)
While when I multiply arrays of the same size in NumPy I get

$ python benchmark.py
76.17 ms
76.05 ms
79.00 ms

So this is mush SLOWER!
I've installed sudo apt-get install libblas-dev liblapack-dev
System:

Ubuntu: 18.04
gcc: 7.4.0
latest versions of xtensor libraries

Could you please help me with this?

xt::linalg multiplication always yields 0 vectors for random input array

When mulitplying two arrays, the result is always vector of 0. (shape agrees, tho).

auto x = xt::random::randn<double>({784});
auto weights = xt::xarray<double> ({1});  // should perform identity

auto res = xt::linalg::outer(x, xt::transpose(weights));
auto res = xt::linalg::tensordot(x, xt::transpose(weights), 0);   // same result, also same without transpose

The output was observed in Jupyter Notebook... however, when compiled and executed in my program, I get even stranger error:

Parameter 8 to routine cblas_dger was incorrect

Quick help appreciated!

EDIT: This seems to be the case only for xt::random::randn. When using xt::arange, the output is as expected.

get_leading_stride() is too restrictive

I tried to do the following:

xtensor<double, 1> A{1, 2, 3, 4, 5}, 
                   B{0.25, 0.5, 0.25}, 
                   C(B.shape(), 0);

// create a view to A looking like
//    1, 2, 3,
//    2, 3, 4,
//    3, 4, 5
static_shape<std::size_t, 2> shape{3, 3}, strides{1, 1};
auto A_as_matrix = adapt(&A(0), 9, no_ownership(), shape, strides);

// call matrix multiplication
C = linalg::dot(A_as_matrix, B);

but got the error message "No valid layout chosen." from get_leading_stride() (xblas_utils.hpp, line 118). However, when I hard-wired get_leading_stride() to return 1, I got the desired result, so my general idea seems to be valid. How can this be solved?

test_tensordot fails to compile

Current git version tests fail to compile when building test_tensordot.cpp.o.
The message (repeated several times) is:
/usr/include/xtensor/xiterator.hpp:324:86: error: invalid use of incomplete type ‘struct xt::detail::LAYOUT_FORBIDEN_FOR_XITERATOR<(xt::layout_type)0>’
using checking_type = typename detail::LAYOUT_FORBIDEN_FOR_XITERATOR::type;
^
/usr/include/xtensor/xiterator.hpp:271:16: note: declaration of ‘struct xt::detail::LAYOUT_FORBIDEN_FOR_XITERATOR<(xt::layout_type)0>’
struct LAYOUT_FORBIDEN_FOR_XITERATOR;
^~~~~~~~~~~~~~~~~~~~~~~~~~~~~

This is with gcc version 8.3.1 20190223 (Red Hat 8.3.1-2) (GCC)

MKL linkage on dot product issues general protection fault

Simple reproducible:

    xt::xarray<float> __im2col = {{ 2.,  3.,  3.,  4.},
                                      { 2.,  1.,  3.,  2.}};
    xt::xarray<float> __f = {{ 2.,  3.,  3.},
                                  { 1.,  4.,  5.},
                                  { 1.,  3.,  5.},
                                  { 2.,  2.,  2.}};
    auto _result = xt::linalg::dot(__im2col, __f);

Gives a stack trace in lldb:

EXC_BAD_ACCESS (code=EXC_I386_GPFLT)
         frame #0: 0x000000010067d894 unit_tests`mkl_blas_avx2_dgemm_kernel_nocopy_TN_b0 + 48724
     unit_tests`mkl_blas_avx2_sgemm_kernel_nocopy_TN_b0:
     ->  0x10067d894 <+48724>: vgatherqpd %ymm6, -0x80(%rcx,%ymm7), %ymm0
         0x10067d89b <+48731>: vmovaps %ymm3, %ymm6
         0x10067d89f <+48735>: vgatherqpd %ymm6, -0x78(%rcx,%ymm7), %ymm1
         0x10067d8a6 <+48742>: vmovaps %ymm3, %ymm6

When I link with cblas instead, it does not fault.

I am using apple clang:

(idp3) ❯ g++ --version
Configured with: --prefix=/Applications/Xcode.app/Contents/Developer/usr --with-gxx-include-dir=/usr/include/c++/4.2.1
Apple LLVM version 9.1.0 (clang-902.0.39.2)
Target: x86_64-apple-darwin17.7.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin

It is tested with gcc-8 that this error does NOT occur.

xt::linalg::trace does not support complex arrays

The xt::linalg::trace cannot deal with complex arrays, which is reflected in this source code (line 1315):

template <class T>
    auto trace(const xexpression<T>& M, int offset = 0, int axis1 = 0, int axis2 = 1)
    {
        const auto& dM = M.derived_cast();
        auto d = xt::diagonal(dM, offset, std::size_t(axis1), std::size_t(axis2));
        
        std::size_t dim = d.dimension();
        if (dim == 1)
        {
            return xt::xarray<double>(xt::sum(d)());
        }
        else
        {
            return xt::xarray<double>(xt::sum(d, {dim - 1}));
        }
    }

I tried to modify this part directly by changing xarray<double> to xarray<std::complex<double>>. But after I do this, I get another error from xutils.hpp of xtensor(line 628):

template <class T>
    struct conditional_cast_functor<true, T>
    {
        template <class U>
        inline auto operator()(U&& u) const
        {
            return static_cast<T>(std::forward<U>(u));
        }
    };

Error:

Cannot convert 'const std::__1::complex<double>' to 'double' without a conversion operator

Although there is a workaround (tracing the real and the complex parts separately), this is not convenient. Could you please look into this problem and fix it? Thanks!

Buffer overflow when dotting 2 xarray

Hello!
I found a buffer overflow in xtensor-blas while dotting 2 array trough address sanitizer.

Source:

#include <iostream>

#include <xtensor/xarray.hpp>
#include <xtensor/xio.hpp>
#include <xtensor-blas/xlinalg.hpp>

using namespace std;


int main()
{
	auto a = xt::zeros<float>({4,2});
	auto b = xt::zeros<float>({2,5});
	cout << xt::linalg::dot(a, b) << endl;
	return 0;
}

Compiled with parameters -O3 -fsanitize=address on GCC and clang and running the binary results in the following.

=================================================================
==9741==ERROR: AddressSanitizer: heap-buffer-overflow on address 0x604000000078 at pc 0x564c811eda8b bp 0x7ffccb3e7300 sp 0x7ffccb3e72f8
READ of size 4 at 0x604000000078 thread T0
    #0 0x564c811eda8a in void cxxblas::gemv_generic<int, float, float, float, float, float>(cxxblas::StorageOrder, cxxblas::Transpose, cxxblas::Transpose, int, int, float const&, float const*, int, float const*, int, float const&, float*, int) (/home/marty/文件/xtensor-test/main+0x11ea8a)
    #1 0x564c811f1fe1 in void cxxblas::gemm_generic<int, float, float, float, float, float>(cxxblas::StorageOrder, cxxblas::Transpose, cxxblas::Transpose, int, int, int, float const&, float const*, int, float const*, int, float const&, float*, int) (/home/marty/文件/xtensor-test/main+0x122fe1)
    #2 0x564c811e7814 in auto xt::linalg::dot<xt::xbroadcast<xt::xscalar<float const>, std::array<unsigned long, 2ul> >, xt::xbroadcast<xt::xscalar<float const>, std::array<unsigned long, 2ul> > >(xt::xexpression<xt::xbroadcast<xt::xscalar<float const>, std::array<unsigned long, 2ul> > > const&, xt::xbroadcast<xt::xscalar<float const>, std::array<unsigned long, 2ul> ><xt::xbroadcast<xt::xscalar<float const>, std::array<unsigned long, 2ul> > > const&) (/home/marty/文件/xtensor-test/main+0x118814)
    #3 0x564c811e6201 in main (/home/marty/文件/xtensor-test/main+0x117201)
    #4 0x7f95181a2f49 in __libc_start_main (/usr/lib/libc.so.6+0x20f49)
    #5 0x564c810ea989 in _start (/home/marty/文件/xtensor-test/main+0x1b989)

0x604000000078 is located 0 bytes to the right of 40-byte region [0x604000000050,0x604000000078)
allocated by thread T0 here:
    #0 0x564c811e1c71 in operator new(unsigned long) (/home/marty/文件/xtensor-test/main+0x112c71)
    #1 0x564c811ea010 in void xt::xstrided_container<xt::xtensor_container<xt::uvector<float, std::allocator<float> >, 2ul, (xt::layout_type)255, xt::xtensor_expression_tag> >::reshape<std::array<unsigned long, 2ul> >(std::array<unsigned long, 2ul>&&, bool) (/home/marty/文件/xtensor-test/main+0x11b010)

SUMMARY: AddressSanitizer: heap-buffer-overflow (/home/marty/文件/xtensor-test/main+0x11ea8a) in void cxxblas::gemv_generic<int, float, float, float, float, float>(cxxblas::StorageOrder, cxxblas::Transpose, cxxblas::Transpose, int, int, float const&, float const*, int, float const*, int, float const&, float*, int)
Shadow bytes around the buggy address:
  0x0c087fff7fb0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c087fff7fc0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c087fff7fd0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c087fff7fe0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
  0x0c087fff7ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
=>0x0c087fff8000: fa fa 00 00 00 00 00 fa fa fa 00 00 00 00 00[fa]
  0x0c087fff8010: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8020: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8030: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8040: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
  0x0c087fff8050: fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa fa
Shadow byte legend (one shadow byte represents 8 application bytes):
  Addressable:           00
  Partially addressable: 01 02 03 04 05 06 07 
  Heap left redzone:       fa
  Freed heap region:       fd
  Stack left redzone:      f1
  Stack mid redzone:       f2
  Stack right redzone:     f3
  Stack after return:      f5
  Stack use after scope:   f8
  Global redzone:          f9
  Global init order:       f6
  Poisoned by user:        f7
  Container overflow:      fc
  Array cookie:            ac
  Intra object redzone:    bb
  ASan internal:           fe
  Left alloca redzone:     ca
  Right alloca redzone:    cb
==9741==ABORTING

No idea if this is a xtensor-blas bug or a BLAS library bug thought.

Many thanks.

non-contiguous array

This is the simplest example that I can construct that still produce this error.

xt::pytensor<double, 2> dot_test(xt::pytensor<double, 3> & ar0,
                                 xt::pytensor<double, 2> & ar1,
                                 xt::pytensor<long, 1> & idx) {
  auto shape0 = ar0.shape();
  auto shape1 = ar1.shape();
  xt::pytensor<double, 2> out({shape0[0], shape0[1]});
  out.fill(NAN);
  for (unsigned j=0; j<len(idx); ++j)
    xt::view(out, j) =
      xt::linalg::dot(xt::view(ar0, j),
                      xt::view(ar1, idx(j)));
  return out;
}

If we give a non-contiguous numpy array as the first argument, it fails.

m1 = np.random.randn(30).reshape(2,3,5)
m2 = np.random.randn(4).reshape(2,2)
xt_tools.dot_test(m1[:,:,2:4], m2, np.arange(2))

Traceback (most recent call last):
File "", line 1, in
RuntimeError: No valid layout chosen.

**As the issue I raised a year ago, replacing the dot product line with the following somehow solves the problem. But this is probably not a good permanent fix.
xt::linalg::dot(xt::eval(xt::view(ar0, j)), xt::eval(xt::view(ar1, idx(j))));

import fail -- Symbol not found: _dgetrf_

I used the cookiecutter to start my own development environment (https://github.com/QuantStack/xtensor-python-cookiecutter). After verifying that it works, I simply added a new function that involves xt::linalg::inv to test out the linear algebra library.

inline xt::pyarray<double> example3(xt::pyarray<double> &m) { return xt::linalg::inv(m); }

And also added the include lines

#include "xtensor-blas/xblas.hpp" #include "xtensor-blas/xlinalg.hpp"

It did compile with the existing setup.py, but upon import I get this error from the interpreter.

ImportError: dlopen(xtensor-example-extension/xtensor_example_extension.cpython-36m-darwin.so, 2): Symbol not found: dgetrf
Referenced from: xtensor-example-extension/xtensor_example_extension.cpython-36m-darwin.so
Expected in: flat namespace

Does anyone know why this is happening? Do I need to add something to setup.py?

crash on destruction of result of dot

Next code crashes on destruction of a3:

    {
        xt::xarray<double> a1{1, 2};
        {
            xt::xarray<double> a2{
                {1, 1, 1},
                {1, 2, 3}
            };
            {
                xt::xarray<double> a3 = xt::linalg::dot(a1, a2);
                std::cout << a3 << std::endl;
            } // crashes here
        }
    }

Also, I expected (3 5 7) as result but the console says {3, 5} (wrong result, wrong shape).
Is this a bug? Or, maybe I'm using xtensor-blas in wrong way, here?

xt::linalg::pinv crashes with both Visual Studio 2015 and Visual Studio 2017

I'm using the xTensor master, with xTL 0.5.4 and OpenBlas (v0.2.14.1) on the Windows 10 x64.

When I tried to use xt::linalg::pinv as in QuanStack/xtensor-blas/test/test_linalg.cpp, line 225 in the debug mode:

xt::xtensor<float, 2> d1 {{1.f, 2.f}}; auto r1 = xt::linalg::pinv(d1);

With Visual Studio 2017, the test crashed right away with the following error:

Message: Invalid parameter detected in function std::array<__int64, 1>::operator [], c:\program files (x86)\microsoft visual studio\2017\enterprise\vc\tools\msvc\14.16.27023\include\array line 184, Expression: “array subscript out of

range”

With Visual Studio 2015, the test passed. However it would crash with the same error as in Visual Studio 2017 when the debugger is attached.

Is it a known issue? Any suggestion?

The norm doesn't support float

Support for float precision is missing

xt::linalg::norm(xt::arange<float>(15), 1); // fails to build
xt::linalg::norm(xt::arange<double>(15), 1); // build ok

Added test to showcase this
#127

Improve handling of empty arrays in dot and solve

At the moment, at least dot and solve seems to handle empty arrays improperly compared to NumPy.

dot

xt::xarray<double> a = xt::ones<double>({ 1, 2, 0, 1});
xt::xarray<double> b = xt::ones<double>({ 1, 0 });
auto result = xt::linalg::dot(a, b);
access violation - cblas_ddot

NumPy:

a = numpy.ones([1, 2, 0, 1])
b = numpy.ones([1, 0])
numpy.dot(a, b)
array([], shape=(1, 2, 0, 0), dtype=float64)

solve

auto a = xt::ones<double>({ 0, 0 });
auto b = xt::ones<double>({ 0 });
auto result = xt::linalg::solve(a, b);
C++ runtime assertion failed
File: gesv.tcc
Line: 98

Expression: info>=0

Numpy:

a = numpy.ones([0, 0])
b = numpy.ones([0])
numpy.linalg.solve(a, b)
array([], dtype=float64)

No installpath/include/xtensor-blas/flens directory after installation

Hi,

When I install xtensor-blas, the content of the directory xtensor-blas/flens is directly copied into installpath/include/ . It follows that the variable FLENS_INCLUDE_DIR provided by xtensor-blasConfig.cmake is not defined.
Furthermore, the project name in the xtensor-blasConfig.cmake writes xtensor_blas while one could expect to find xtensor-blas instead.

If I correct CMakeLists by appending xtensor-blas/flens for the install include path and if I use variable ${xtensor_blas_INCLUDE_DIRS}, I am able to use xtensor-blas as third-party library.

By the way, I really like the work you are doing ;-) !

Cheers,

Thibaud Kloczko
[email protected]

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.