Giter VIP home page Giter VIP logo

xtensor-stack / xsimd Goto Github PK

View Code? Open in Web Editor NEW
2.0K 71.0 245.0 3.88 MB

C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))

Home Page: https://xsimd.readthedocs.io/

License: BSD 3-Clause "New" or "Revised" License

C++ 97.47% CMake 1.29% Shell 1.02% HTML 0.01% Python 0.21%
simd-intrinsics vectorization simd cpp avx neon sse avx512 simd-instructions mathematical-functions

xsimd's People

Contributors

amyspark avatar andre-bergner avatar anutosh491 avatar astrohawk avatar bluescarni avatar cyb70289 avatar derthorsten avatar easyaspi314 avatar emmenlau avatar guyuqi avatar hadrieng2 avatar hleclerc avatar jatinchowdhury18 avatar johanmabille avatar luhenry avatar mainland avatar martinrenou avatar maxmarsc avatar mehdichinoune avatar omaralvarez avatar pitrou avatar serge-sans-paille avatar sylvaincorlay avatar tdegeus avatar tokinobug avatar tomjnixon avatar wermos avatar wolfv avatar yumeyao avatar zhihaoy avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

xsimd's Issues

Make xaligned_malloc default

I have benchmarked posix_memalign and _mm_malloc, and they are quite a bit slower for 32 bit alignments than the xaligned_malloc/free.

Or we could enable/disable them by some compile time flags?

Can not build with Intel Compiler on MSVC because cmake fails

Currently the build fails for me with the latest Intel Compiler on MSVC (and Visual Studio 2015). The build aborts in cmake configuration with error

-- Performing Test HAS_CPP11_FLAG
-- Performing Test HAS_CPP11_FLAG - Failed
CMake Error at test/CMakeLists.txt:53 (message):
  Unsupported compiler -- xsimd requires C++11 support!

-- Configuring incomplete, errors occurred!

I assume the detection is not perfect, because the Intel Compiler has support for C++11. Is there anything I can do to help?

Aligned Pool Allocator

An aligned pool allocator that keeps memory around until explicit flush/cleanup would be a nice additional feature.

batch<float, 8>::store_unaligned() segfaults

In xsimd_avx_float.hpp, the function batch<float, 8>::store_unaligned() calls the aligned intrinsic _mm256_store_ps() instead of the unaligned _mm256_storeu_ps(), leading to a segfault when the address is actually unaligned. I didn't check if the same problem occurs for other types.

CMakeLists.txt compliant with Debian packaging

Following changes are required:

  • download gtest instead of using local installation
  • test installed package instead of local one
  • remove CMAKE_SIZEOF_VOID_P from xsimdConfigVersion.cmake

Compiling Simple Test Program

Hello!

I tried compiling the simple test program you include in your documentation

#include <iostream>
#include "xsimd/xsimd.hpp"

namespace xs = xsimd;

int main(int argc, char* argv[])
{
    xs::batch<double, 4> a(1.5, 2.5, 3.5, 4.5);
    xs::batch<double, 4> b(2.5, 3.5, 4.5, 5.5);
    auto mean = (a + b) / 2;
    std::cout << mean << std::endl;
    return 0;
}

I compile using
g++ -std=c++14 -o out -I xsimd/include/ main.cpp
where gcc is version 5.4.1 .

I get the following error

In file included from xsimd/types/xsimd_types_include.hpp:15:0,
                 from xsimd/types/xsimd_traits.hpp:12,
                 from xsimd/xsimd.hpp:14,
                 from main.cpp:2:
xsimd/types/xsimd_sse_int32.hpp: In function ‘xsimd::batch<int, 4ul> xsimd::select(const xsimd::batch_bool<int, 4ul>&, const xsimd::batch<int, 4ul>&, const xsimd::batch<int, 4ul>&)’:
xsimd/types/xsimd_sse_int32.hpp:441:70: error: ‘s’ was not declared in this scope
         return _mm_or_si128(_mm_and_si128(cond, a), _mm_andnot_si128(s, b));
                                                                      ^
In file included from xsimd/types/xsimd_types_include.hpp:16:0,
                 from xsimd/types/xsimd_traits.hpp:12,
                 from xsimd/xsimd.hpp:14,
                 from main.cpp:2:
xsimd/types/xsimd_sse_int64.hpp: In function ‘xsimd::batch<long int, 2ul> xsimd::select(const xsimd::batch_bool<long int, 2ul>&, const xsimd::batch<long int, 2ul>&, const xsimd::batch<long int, 2ul>&)’:
xsimd/types/xsimd_sse_int64.hpp:460:70: error: ‘s’ was not declared in this scope
         return _mm_or_si128(_mm_and_si128(cond, a), _mm_andnot_si128(s, b));
                                                                      ^

I'm sure it's something simple I'm missing.

Thanks!

operator<< and operator>> implementation

The current implementations of this operators rely on operator[] and separately operates on elements of a batch.
This should be refactored with specifics intrinsics for better performances.

Test failures for old Intel arch

There are some small failures with loading 64bit integers from char for -arch=nocona

[ RUN      ] xsimd.api_load
1: lhs = 67305985 - rhs = 1
2: lhs = 134678021 - rhs = 2
load uchar   -> int64  : BAD
Nb diff  : 2 (100%)
1: lhs = 67305985 - rhs = 1
2: lhs = 134678021 - rhs = 2
loadu uchar  -> int64  : BAD
Nb diff  : 2 (100%)
/home/wolfv/Programs/xsimd/test/xsimd_api_test.cpp:70: Failure
Value of: res
  Actual: false
Expected: true
[  FAILED  ] xsimd.api_load (0 ms)

and

[ RUN      ] xsimd.sse_load
1: lhs = 67305985 - rhs = 1
2: lhs = 134678021 - rhs = 2
load uchar  -> int64  : BAD
Nb diff  : 2 (100%)

AVX512 plans?

Hello xsimd developers,

I was wondering if you have any timeline on AVX512 support?

I recently gained access to a Skylake-AVX512 Xeon server, and wanted to test out how the new instructions would fare on my xsimd-enhanced libraries. I went ahead and hacked together some very preliminary support for AVX512 in xsimd which is nevertheless sufficient for some initial experiments (single-precision floats, basic maths). Posting the link here, in case it is of any use:

master...bluescarni:avx512

(note that I am pretty much a newbie when it comes to SIMD programming, so probably there are inefficiencies and mistakes)

Thanks for the great library!

Remove need for -flax-vector-types on ARM/GCC

char loading to other batches currently makes use of lax conversion enabled by default on clang (apparently).
We should remove the need to enable this compiler flag on GCC so that it compiles without hassle (by using static casts where appropriate).

Implement transpose

We should implement a transpose operation to transpose NxN matrix blocks (where N is batch width).
The interface should probably look like the one for haddp, e.g. taking a pointer of rows.

template <class T, N>
void transpose(batch<T, N>* rows)
{
... inplace transpose ...
}

scatter and gather instructions

It could be interesting to support intels scatter and gather instructions (however, scatter does seem to be available only since AVX512).

is_nan vs isnan

In the STL it's std::isnan and it's also xsimd::isinf.
So is there a reason for xsimd::is_nan or would it make sense to rename it to xsimd::isnan to adhere to STL and unify the xsimd interface?

Add arange

SIMDified arange can give a ~4x improvement about std::iota and for loops.

Unit tests should exercise constructors

So, I tried to integrate #100 in my project to see if it solves the problem it's intended to solve, and I discovered a bunch of issues related to constructors of batch and batch_bool. This means that we don't yet have unit tests for them.

Current batch_bool<float/double> equality operator on SSE/AVX wrong

We're using a __m256d floating point type to store the batch bool for float/double in SSE and AVX and do equality comparison using variants of mm_cmp_pd/ps(...)

The problem is that theses functions check for NaN as NaNs are incomparable -- and you can select a mode to get to a desired result (e.g. ordered comparsion results in NaN and number to be false or whatever).

The other fact is that a true value is represented by setting all bits to 1 – including the NaN indicating bit.

So currently, if you compare a batch_bool<double, 4> a(true); a == a with itself, the result will be filled with false as you compare two NaN numbers with each other.

Solution a) switch to integer type inside of batch_bool. This reduces also the amount of implementations (as we can share for int/float int64/double).
Just need to add a cast to or constructor from __m256d (as that's still the result type of the comparison of float/double batches).

Solution b) cast to int before comparison and cast back for storage.

add a sincos method

Some of the mathematical functions need both the sine and the cosine of their argument. Since these functions share some steps in their computation (among them, the reduction), a sincos method computing both sine and cosine in a single pass may improve performances.

A version of select() with a compile-time mask would be nice

Testing fails on Debian

Using version 3.0.1 + fix for select, here is the output from the tests:

[==========] Running 22 tests from 1 test case.
[----------] Global test environment set-up.
[----------] 22 tests from xsimd
[ RUN      ] xsimd.sse_float_trigonometric
[       OK ] xsimd.sse_float_trigonometric (9 ms)
[ RUN      ] xsimd.sse_double_trigonometric
[       OK ] xsimd.sse_double_trigonometric (9 ms)
[ RUN      ] xsimd.sse_float_rounding
/<<PKGBUILDDIR>>/test/xsimd_rounding_test.cpp:34: Failure
Value of: res
  Actual: false
Expected: true
[  FAILED  ] xsimd.sse_float_rounding (0 ms)
[ RUN      ] xsimd.sse_double_rounding
/<<PKGBUILDDIR>>/test/xsimd_rounding_test.cpp:41: Failure
Value of: res
  Actual: false
Expected: true
[  FAILED  ] xsimd.sse_double_rounding (0 ms)
[ RUN      ] xsimd.sse_float_power
[       OK ] xsimd.sse_float_power (6 ms)
[ RUN      ] xsimd.sse_double_power
[       OK ] xsimd.sse_double_power (4 ms)
[ RUN      ] xsimd.sse_float_hyperbolic
[       OK ] xsimd.sse_float_hyperbolic (8 ms)
[ RUN      ] xsimd.sse_double_hyperbolic
[       OK ] xsimd.sse_double_hyperbolic (5 ms)
[ RUN      ] xsimd.sse_float_fp_manipulation
[       OK ] xsimd.sse_float_fp_manipulation (0 ms)
[ RUN      ] xsimd.sse_double_fp_manipulation
[       OK ] xsimd.sse_double_fp_manipulation (0 ms)
[ RUN      ] xsimd.sse_float_exponential
/<<PKGBUILDDIR>>/test/xsimd_exponential_test.cpp:35: Failure
Value of: res
  Actual: false
Expected: true
[  FAILED  ] xsimd.sse_float_exponential (10 ms)
[ RUN      ] xsimd.sse_double_exponential
/<<PKGBUILDDIR>>/test/xsimd_exponential_test.cpp:42: Failure
Value of: res
  Actual: false
Expected: true
[  FAILED  ] xsimd.sse_double_exponential (7 ms)
[ RUN      ] xsimd.sse_float_error_gamma
/<<PKGBUILDDIR>>/test/xsimd_error_gamma_test.cpp:35: Failure
Value of: res
  Actual: false
Expected: true
[  FAILED  ] xsimd.sse_float_error_gamma (23 ms)
[ RUN      ] xsimd.sse_double_error_gamma
/<<PKGBUILDDIR>>/test/xsimd_error_gamma_test.cpp:42: Failure
Value of: res
  Actual: false
Expected: true
[  FAILED  ] xsimd.sse_double_error_gamma (13 ms)
[ RUN      ] xsimd.sse_float_basic_math
[       OK ] xsimd.sse_float_basic_math (0 ms)
[ RUN      ] xsimd.sse_double_basic_math
[       OK ] xsimd.sse_double_basic_math (0 ms)
[ RUN      ] xsimd.sse_float_basic
[       OK ] xsimd.sse_float_basic (0 ms)
[ RUN      ] xsimd.sse_double_basic
[       OK ] xsimd.sse_double_basic (0 ms)
[ RUN      ] xsimd.sse_int32_basic
[       OK ] xsimd.sse_int32_basic (0 ms)
[ RUN      ] xsimd.sse_int64_basic
[       OK ] xsimd.sse_int64_basic (0 ms)
[ RUN      ] xsimd.sse_conversion
[       OK ] xsimd.sse_conversion (0 ms)
[ RUN      ] xsimd.sse_cast
[       OK ] xsimd.sse_cast (0 ms)
[----------] 22 tests from xsimd (94 ms total)

[----------] Global test environment tear-down
[==========] 22 tests from 1 test case ran. (94 ms total)
[  PASSED  ] 16 tests.
[  FAILED  ] 6 tests, listed below:
[  FAILED  ] xsimd.sse_float_rounding
[  FAILED  ] xsimd.sse_double_rounding
[  FAILED  ] xsimd.sse_float_exponential
[  FAILED  ] xsimd.sse_double_exponential
[  FAILED  ] xsimd.sse_float_error_gamma
[  FAILED  ] xsimd.sse_double_error_gamma

 6 FAILED TESTS

xsimd includes cause failure for specific arch

I included xsimd.hpp from cloning the repo, then going into includes, and making a main.cpp at the same path:

#include "xsimd.hpp"
#include <iostream>
int main(){
	std::cout << "Hello world.";
}

and it fails with the following error when I try to compile:

root@dc0238c74c63:/.../native/third_party/xsimd/include/xsimd# g++ main.cpp -o main
In file included from types/xsimd_types_include.hpp:22,
                 from types/xsimd_traits.hpp:14,
                 from xsimd.hpp:14,
                 from main.cpp:1:
types/xsimd_sse_int8.hpp: In static member function 'static xsimd::detail::sse_int8_batch_kernel<signed char>::batch_type xsimd::detail::batch_kernel<signed char, 16>::abs(const batch_type&)':
types/xsimd_sse_int8.hpp:590:32: error: '_mm_srai_epi8' was not declared in this scope
                 __m128i sign = _mm_srai_epi8(rhs, 31);
                                ^~~~~~~~~~~~~
types/xsimd_sse_int8.hpp:590:32: note: suggested alternative: '_mm_srai_epi32'
                 __m128i sign = _mm_srai_epi8(rhs, 31);
                                ^~~~~~~~~~~~~
                                _mm_srai_epi32

Here is my system information:

root@dc0238c74c63:/# cat /proc/cpuinfo
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU @ 2.30GHz
stepping	: 0
microcode	: 0x1
cpu MHz		: 2300.000
cache size	: 46080 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 0
initial apicid	: 0
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs		:
bogomips	: 4600.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:

processor	: 1
vendor_id	: GenuineIntel
cpu family	: 6
model		: 63
model name	: Intel(R) Xeon(R) CPU @ 2.30GHz
stepping	: 0
microcode	: 0x1
cpu MHz		: 2300.000
cache size	: 46080 KB
physical id	: 0
siblings	: 2
core id		: 0
cpu cores	: 1
apicid		: 1
initial apicid	: 1
fpu		: yes
fpu_exception	: yes
cpuid level	: 13
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush mmx fxsr sse sse2 ss ht syscall nx pdpe1gb rdtscp lm constant_tsc rep_good nopl xtopology nonstop_tsc pni pclmulqdq ssse3 fma cx16 pcid sse4_1 sse4_2 x2apic movbe popcnt aes xsave avx f16c rdrand hypervisor lahf_lm abm fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid xsaveopt
bugs		:
bogomips	: 4600.00
clflush size	: 64
cache_alignment	: 64
address sizes	: 46 bits physical, 48 bits virtual
power management:
root@dc0238c74c63:/# gcc --version
gcc (Ubuntu 8.1.0-1ubuntu1) 8.1.0
Copyright (C) 2018 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
root@dc0238c74c63:/# lsb_release -a
No LSB modules are available.
Distributor ID:	Ubuntu
Description:	Ubuntu 18.04.1 LTS
Release:	18.04
Codename:	bionic

Documentation search bar is broken

For some reason, the ReadTheDocs documentation does not have a working search index yet. That is sad, as it is a very important feature for finding functions in the docs when one doesn't know how the docs are organized.

P.S. This issue likely also affects other QuantStack projects using the same doc generation toolchain.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.