Giter VIP home page Giter VIP logo

packed_simd's Introduction

Simd<[T; N]>

Implementation of Rust RFC #2366: std::simd

Latest Version docs

WARNING: this crate only supports the most recent nightly Rust toolchain and will be superseded by #![feature(portable_simd)].

Documentation

Examples

Most of the examples come with both a scalar and a vectorized implementation.

Cargo features

  • into_bits (default: disabled): enables FromBits/IntoBits trait implementations for the vector types. These allow reinterpreting the bits of a vector type as those of another vector type safely by just using the .into_bits() method.

Performance

The following ISPC examples are also part of packed_simd's examples/ directory, where packed_simd+rayon are used to emulate ISPC's Single-Program-Multiple-Data (SPMD) programming model. The performance results on different hardware is shown in the readme.md of each example. The following table summarizes the performance ranges, where + means speed-up and - slowdown:

  • aobench: [-1.02x, +1.53x],
  • stencil: [+1.06x, +1.72x],
  • mandelbrot: [-1.74x, +1.2x],
  • options_pricing:
    • black_scholes: +1.0x
    • binomial_put: +1.4x

While SPMD is not the intended use case for packed_simd, it is possible to combine the library with rayon to poorly emulate ISPC's SPMD programming model in Rust. Writing performant code is not as straightforward as with ISPC, but with some care (e.g. see the Performance Guide) one can easily match and often out-perform ISPC's "default performance".

Platform support

The following table describes the supported platforms: build shows whether the library compiles without issues for a given target, while run shows whether the test suite passes for a given target.

Linux build run
i586-unknown-linux-gnu
i686-unknown-linux-gnu
x86_64-unknown-linux-gnu
arm-unknown-linux-gnueabihf
armv7-unknown-linux-gnueabi
aarch64-unknown-linux-gnu
powerpc-unknown-linux-gnu
powerpc64-unknown-linux-gnu
powerpc64le-unknown-linux-gnu
s390x-unknown-linux-gnu
sparc64-unknown-linux-gnu
thumbv7neon-unknown-linux-gnueabihf
MacOSX build run
x86_64-apple-darwin
Android build run
x86_64-linux-android
armv7-linux-androideabi
aarch64-linux-android
thumbv7neon-linux-androideabi
iOS build run
x86_64-apple-ios
aarch64-apple-ios

Machine code verification

The verify/ crate tests disassembles the portable packed vector APIs at run-time and compares the generated machine code against the desired one to make sure that this crate remains efficient.

License

This project is licensed under either of

at your option.

Contributing

We welcome all people who want to contribute. Please see the contributing instructions for more information.

Contributions in any form (issues, pull requests, etc.) to this project must adhere to Rust's Code of Conduct.

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in packed_simd by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

packed_simd's People

Contributors

3for avatar artoria2e5 avatar atouchet avatar burrbull avatar calebzulawski avatar chaitrex avatar dependabot-preview[bot] avatar dependabot-support avatar dhardy avatar dtolnay avatar eclipseo avatar ecstatic-morse avatar firstyear avatar gabrielmajeri avatar gnzlbg avatar hkratz avatar hsivonen avatar jhorstmann avatar johntitor avatar lokathor avatar mati865 avatar nivkner avatar oli-obk avatar pietroalbini avatar rustyyato avatar stupremee avatar theironborn avatar thomcc avatar vks avatar workingjubilee avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

packed_simd's Issues

{i686,x86_64}-pc-windows-gnu Appveyor builds failing

It appears that incorrect results are produced on windows when the gnu toolchain is used and NaNs are involved.

failures:
---- v64::f32x2_math_cos::cos stdout ----
thread 'v64::f32x2_math_cos::cos' panicked at 'assertion failed: `(left == right)`
  left: `f32x2(-0.00000004371139, -0.00000004371139)`,
 right: `f32x2(NaN, -0.00000004371139)`', src\v64.rs:43:1
---- v64::f32x2_math_fma::fma stdout ----
thread 'v64::f32x2_math_fma::fma' panicked at 'assertion failed: `(left == right)`
  left: `f32x2(1.0, 1.0)`,
 right: `f32x2(NaN, NaN)`', src\v64.rs:43:1
---- v64::f32x2_math_sin::sin stdout ----
thread 'v64::f32x2_math_sin::sin' panicked at 'assertion failed: `(left == right)`
  left: `f32x2(1.0, 1.0)`,
 right: `f32x2(NaN, 1.0)`', src\v64.rs:43:1
---- v64::f32x2_ops_scalar_arith::ops_scalar_arithmetic stdout ----
thread 'v64::f32x2_ops_scalar_arith::ops_scalar_arithmetic' panicked at 'assertion failed: `(left == right)`
  left: `f32x2(NaN, 0.0)`,
 right: `f32x2(0.0, 0.0)`', src\v64.rs:43:1
---- v64::f32x2_ops_vector_arith::ops_vector_arithmetic stdout ----
thread 'v64::f32x2_ops_vector_arith::ops_vector_arithmetic' panicked at 'assertion failed: `(left == right)`
  left: `f32x2(NaN, 0.0)`,
 right: `f32x2(0.0, 0.0)`', src\v64.rs:43:1
failures:
    v64::f32x2_math_cos::cos
    v64::f32x2_math_fma::fma
    v64::f32x2_math_sin::sin
    v64::f32x2_ops_scalar_arith::ops_scalar_arithmetic
    v64::f32x2_ops_vector_arith::ops_vector_arithmetic
test result: FAILED. 1436 passed; 5 failed; 0 ignored; 0 measured; 0 filtered out

cc @retep007

rotate tests fail in s390x-unknown-linux-gnu and sparc64-unknown-linux-gnu

s390x-unknown-linux-gnu

---- v128::i128x1_ops_vector_rotate::rotate_ops stdout ----
thread 'v128::i128x1_ops_vector_rotate::rotate_ops' panicked at 'assertion failed: `(left == right)`
  left: `i128x1(1208925819614629174706176)`,
 right: `i128x1(1)`', src/v128.rs:68:1
---- v128::u128x1_ops_vector_rotate::rotate_ops stdout ----
thread 'v128::u128x1_ops_vector_rotate::rotate_ops' panicked at 'assertion failed: `(left == right)`
  left: `u128x1(1208925819614629174706176)`,
 right: `u128x1(1)`', src/v128.rs:72:1

sparc64-unknown-linux-gnu

---- v128::i128x1_ops_vector_rotate::rotate_ops stdout ----
thread 'v128::i128x1_ops_vector_rotate::rotate_ops' panicked at 'assertion failed: `(left == right)`
  left: `i128x1(2)`,
 right: `i128x1(1)`', src/v128.rs:68:1
---- v128::u128x1_ops_vector_rotate::rotate_ops stdout ----
thread 'v128::u128x1_ops_vector_rotate::rotate_ops' panicked at 'assertion failed: `(left == right)`
  left: `u128x1(2)`,
 right: `u128x1(1)`', src/v128.rs:72:1

floating-point reductions fail on ppc64 (big-endian)

See:

https://travis-ci.org/rust-lang-nursery/packed_simd/jobs/411174331#L4944

---- v128::f32x4_reduction_min_max_nan::max_element_test stdout ----
thread 'v128::f32x4_reduction_min_max_nan::max_element_test' panicked at 'assertion failed: `(left == right)`
  left: `0.0`,
 right: `-3.0`: [D]: nan at 0 => 0 | f32x4(NaN, -3.0, -3.0, -3.0)', src/v128.rs:42:1
---- v128::f32x4_reduction_min_max_nan::min_element_test stdout ----
thread 'v128::f32x4_reduction_min_max_nan::min_element_test' panicked at 'assertion failed: `(left == right)`
  left: `0.0`,
 right: `-3.0`: [F]: nan at 1 => 0 | f32x4(NaN, NaN, -3.0, -3.0)', src/v128.rs:42:1
---- v256::f32x8_reduction_min_max_nan::max_element_test stdout ----
thread 'v256::f32x8_reduction_min_max_nan::max_element_test' panicked at 'assertion failed: `(left == right)`
  left: `0.0`,
 right: `-3.0`: [D]: nan at 0 => 0 | f32x8(NaN, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0)', src/v256.rs:48:1
---- v256::f32x8_reduction_min_max_nan::min_element_test stdout ----
thread 'v256::f32x8_reduction_min_max_nan::min_element_test' panicked at 'assertion failed: `(left == right)`
  left: `0.0`,
 right: `-3.0`: [F]: nan at 3 => 0 | f32x8(NaN, NaN, NaN, NaN, -3.0, -3.0, -3.0, -3.0)', src/v256.rs:48:1
---- v512::f32x16_reduction_min_max_nan::max_element_test stdout ----
thread 'v512::f32x16_reduction_min_max_nan::max_element_test' panicked at 'assertion failed: `(left == right)`
  left: `0.0`,
 right: `-3.0`: [D]: nan at 0 => 0 | f32x16(NaN, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0)', src/v512.rs:59:1
---- v512::f32x16_reduction_min_max_nan::min_element_test stdout ----
thread 'v512::f32x16_reduction_min_max_nan::min_element_test' panicked at 'assertion failed: `(left == right)`
  left: `0.0`,
 right: `-3.0`: [F]: nan at 7 => 0 | f32x16(NaN, NaN, NaN, NaN, NaN, NaN, NaN, NaN, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0, -3.0)', src/v512.rs:59:1
---- v64::f32x2_reduction_min_max_nan::max_element_test stdout ----
thread 'v64::f32x2_reduction_min_max_nan::max_element_test' panicked at 'assertion failed: `(left == right)`
  left: `0.0`,
 right: `-3.0`: [D]: nan at 0 => 0 | f32x2(NaN, -3.0)', src/v64.rs:43:1
---- v64::f32x2_reduction_min_max_nan::min_element_test stdout ----
thread 'v64::f32x2_reduction_min_max_nan::min_element_test' panicked at 'assertion failed: `(left == right)`
  left: `0.0`,
 right: `-3.0`: [D]: nan at 0 => 0 | f32x2(NaN, -3.0)', src/v64.rs:43:1
failures:
    v128::f32x4_reduction_min_max_nan::max_element_test
    v128::f32x4_reduction_min_max_nan::min_element_test
    v256::f32x8_reduction_min_max_nan::max_element_test
    v256:+ return 1
:f32x8_reduction_min_max_nan::min_element_test
    v512::f32x16_reduction_min_max_nan::max_element_test
    v512::f32x16_reduction_min_max_nan::min_element_test
    v64::f32x2_reduction_min_max_nan::max_element_test
    v64::f32x2_reduction_min_max_nan::min_element_test

appveyor build bots run out of memory

All 3 out of 4 appveyor build bots currently run out of memory.

We should try to find a way to minimize memory consumption in the ci/run.sh file.

A good start would be to switch to:

  • single job: --jobs 1 / CARGO_BUILD_JOBS=1
  • single code-gen unit: RUSTFLAGS="-C codegen-units=1"
  • no debug info RUSTFLAGS="-C debuginfo=0"

Investigate using llvm.bswap for vertical byte_swap

Currently we use shuffles for implementing vertical byte_swap but llvm.bswap works on vectors as well. We should investigate which method generates better code, use it, and fill LLVM bugs for the other (they should generate identical code).

Add bindings for short math vector libraries

LLVM is very far off from properly supporting short-vector floating point math, which basically means that all/most vector math functions get scalarized.

A plan to work around could be:

  • creates to interface with the platform's vector math library:

    • libmvec-sys for libm,
    • svml-sys for svml,
    • etc.

    These crates would be optional features of packed_simd, and each of them would have their own features that users could configure. Hopefully, these crates already exist, but if they don't we'll have to create them. They should not live in packed_simd, they should be their own independent crates since these are generally useful.

  • use compile-time feature detection in packed_simd to manually call the implementations in those libraries when available.

mandelbrot test fails on i586-unknown-linux-gnu

See: https://travis-ci.org/rust-lang-nursery/packed_simd/jobs/411233811#L1001

+ cargo test --verbose --target=i586-unknown-linux-gnu --manifest-path=target/mandelbrot/Cargo.toml
+ tee
+ [[ 101 != 0 ]]
+ cat target/output
    Updating registry `https://github.com/rust-lang/crates.io-index`
   Compiling cfg-if v0.1.4
     Running `rustc --crate-name cfg_if /cargo/registry/src/github.com-1ecc6299db9ec823/cfg-if-0.1.4/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 -C metadata=3725132fcf53b8a6 -C extra-filename=-3725132fcf53b8a6 --out-dir /checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps --target i586-unknown-linux-gnu -L dependency=/checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps -L dependency=/checkout/target/mandelbrot/target/debug/deps --cap-lints allow -C codegen-units=1`
   Compiling packed_simd v0.1.0 (file:///checkout)
     Running `rustc --crate-name packed_simd /checkout/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="default"' -C metadata=84bb1f34795398f4 -C extra-filename=-84bb1f34795398f4 --out-dir /checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps --target i586-unknown-linux-gnu -C incremental=/checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/incremental -L dependency=/checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps -L dependency=/checkout/target/mandelbrot/target/debug/deps --extern cfg_if=/checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps/libcfg_if-3725132fcf53b8a6.rlib -C codegen-units=1`
   Compiling mandelbrot v0.1.0 (file:///checkout/target/mandelbrot)
     Running `rustc --crate-name mandelbrot_lib src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 -C metadata=7ad81051bdcd78ef -C extra-filename=-7ad81051bdcd78ef --out-dir /checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps --target i586-unknown-linux-gnu -C incremental=/checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/incremental -L dependency=/checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps -L dependency=/checkout/target/mandelbrot/target/debug/deps --extern packed_simd=/checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps/libpacked_simd-84bb1f34795398f4.rlib -C codegen-units=1`
     Running `rustc --crate-name mandelbrot_lib src/lib.rs --emit=dep-info,link -C debuginfo=2 --test -C metadata=30052dd9fa8b806c -C extra-filename=-30052dd9fa8b806c --out-dir /checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps --target i586-unknown-linux-gnu -C incremental=/checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/incremental -L dependency=/checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps -L dependency=/checkout/target/mandelbrot/target/debug/deps --extern packed_simd=/checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps/libpacked_simd-84bb1f34795398f4.rlib -C codegen-units=1`
     Running `rustc --crate-name mandelbrot src/main.rs --emit=dep-info,link -C debuginfo=2 --test -C metadata=a1c8ce1461118ba0 -C extra-filename=-a1c8ce1461118ba0 --out-dir /checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps --target i586-unknown-linux-gnu -C incremental=/checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/incremental -L dependency=/checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps -L dependency=/checkout/target/mandelbrot/target/debug/deps --extern mandelbrot_lib=/checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps/libmandelbrot_lib-7ad81051bdcd78ef.rlib --extern packed_simd=/checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps/libpacked_simd-84bb1f34795398f4.rlib -C codegen-units=1`
    Finished dev [unoptimized + debuginfo] target(s) in 34.66s
     Running `/checkout/target/mandelbrot/target/i586-unknown-linux-gnu/debug/deps/mandelbrot_lib-30052dd9fa8b806c`
running 1 test
test tests::verify_simd ... FAILED
failures:
---- tests::verify_simd stdout ----
thread 'tests::verify_simd' panicked at 'assertion failed: `(left == right)`
  left: `[13, 48, 142, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 168, 205, 237, 168, 205, 237, 168, 205, 237, 168, 205, 237, 168, 205, 237, 168, 205, 237, 238, 247, 233, 238, 247, 233, 238, 247, 233, 246, 212, 127, 170, 114, 0, 63, 44, 0, 26, 90, 185, 13, 48, 142, 170, 114, 0, 170, 114, 0, 0, 2, 16, 0, 2, 16, 246, 212, 127, 0, 0, 0, 238, 247, 233, 168, 205, 237, 168, 205, 237, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185]`,
 right: `[13, 48, 142, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 168, 205, 237, 168, 205, 237, 168, 205, 237, 168, 205, 237, 168, 205, 237, 168, 205, 237, 238, 247, 233, 238, 247, 233, 238, 247, 233, 246, 212, 127, 170, 114, 0, 63, 44, 0, 26, 90, 185, 13, 48, 142, 170, 114, 0, 170, 114, 0, 0, 2, 16, 0, 2, 16, 246, 212, 127, 63, 44, 0, 238, 247, 233, 168, 205, 237, 168, 205, 237, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 83, 144, 216, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185, 26, 90, 185]`: line 0 differs', src/lib.rs:88:17

nbody example fails to compile on armv7-apple-ios

See https://travis-ci.org/rust-lang-nursery/packed_simd/jobs/411264514#L168

+cargo build --verbose --target=armv7-apple-ios --manifest-path=target/nbody/Cargo.toml
+tee
+[[ 101 != 0 ]]
+cat target/output
   Compiling cfg-if v0.1.4
     Running `rustc --crate-name cfg_if /Users/travis/.cargo/registry/src/github.com-1ecc6299db9ec823/cfg-if-0.1.4/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 -C metadata=5d8001ee3ded794a -C extra-filename=-5d8001ee3ded794a --out-dir /Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps --target armv7-apple-ios -L dependency=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps -L dependency=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/debug/deps --cap-lints allow -C codegen-units=1 -C target-feature=+neon`
   Compiling packed_simd v0.1.0 (file:///Users/travis/build/rust-lang-nursery/packed_simd)
     Running `rustc --crate-name packed_simd /Users/travis/build/rust-lang-nursery/packed_simd/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="default"' -C metadata=3c5c861616491f51 -C extra-filename=-3c5c861616491f51 --out-dir /Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps --target armv7-apple-ios -C incremental=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/incremental -L dependency=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps -L dependency=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/debug/deps --extern cfg_if=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/libcfg_if-5d8001ee3ded794a.rlib -C codegen-units=1 -C target-feature=+neon`
   Compiling nbody v0.1.0 (file:///Users/travis/build/rust-lang-nursery/packed_simd/target/nbody)
     Running `rustc --crate-name nbody_lib src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 -C metadata=9b52a83dc6484f92 -C extra-filename=-9b52a83dc6484f92 --out-dir /Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps --target armv7-apple-ios -C incremental=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/incremental -L dependency=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps -L dependency=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/debug/deps --extern packed_simd=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/libpacked_simd-3c5c861616491f51.rlib -C codegen-units=1 -C target-feature=+neon`
     Running `rustc --crate-name nbody src/main.rs --crate-type bin --emit=dep-info,link -C debuginfo=2 -C metadata=df77513c362d3c5d -C extra-filename=-df77513c362d3c5d --out-dir /Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps --target armv7-apple-ios -C incremental=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/incremental -L dependency=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps -L dependency=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/debug/deps --extern nbody_lib=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/libnbody_lib-9b52a83dc6484f92.rlib --extern packed_simd=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/libpacked_simd-3c5c861616491f51.rlib -C codegen-units=1 -C target-feature=+neon`
error: linking with `cc` failed: exit code: 1
  |
  = note: "cc" "-arch" "armv7" "-Wl,-syslibroot" "/Applications/Xcode-9.4.1.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS11.4.sdk" "-L" "/Users/travis/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/armv7-apple-ios/lib" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.16u6js6g0l3k1ic6.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.181cuta0v63atwcm.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.1im38lueib99jsk0.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.1pyg38ew8eq184bu.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.1vut2eft6nlujjxr.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.1y16o1qfye96o7m0.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.28m6b5dkfoixx5aa.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.2jqywn86b2gsqohu.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.2lyh15q6cjwzy18c.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.2qhkzqx5zqexj20y.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.38ps4pa181wsnsy9.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.3ayaeypdcro9d6yk.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.3ik0x0hz6l66cx38.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.3ldk0i2zxftngav8.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.3rngp6bm2u2q5z0y.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.3wta9ctgdrpkmlpr.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.45pc7c65foh9i35f.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.49a7n47po4ttqjl7.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.49lx1q7cxvpykyv0.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.4b8ptp1vn215jmoe.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.4oc10dk278mpk1vy.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.4xq48u46a1pwiqn7.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.4ybye971cqflgun6.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.4yh8x2b62dcih00t.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.4ypvbwho0bu5tnww.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.51fpy5zjki32la64.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.53py2009ooqfzkcu.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.57k06xfugllsc526.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.5frs3mx5dzjbj7u6.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.8xzrsc1ux72v29j.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.98g0d9x8aw3akpe.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.9elsx31vb4it187.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.bb78ejlb1ru86kx.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.c6lbtaiefvx3wya.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.oa3rad818d8sgn4.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.y08g5q2x813c4wx.rcgu.o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.z9ox7biyn1otfln.rcgu.o" "-o" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/nbody-df77513c362d3c5d.crate.allocator.rcgu.o" "-Wl,-dead_strip" "-nodefaultlibs" "-L" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps" "-L" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/debug/deps" "-L" "/Users/travis/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/armv7-apple-ios/lib" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/libnbody_lib-9b52a83dc6484f92.rlib" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/libpacked_simd-3c5c861616491f51.rlib" "/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/libcfg_if-5d8001ee3ded794a.rlib" "/Users/travis/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/armv7-apple-ios/lib/libstd-649c4b8d0e886711.rlib" "/Users/travis/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/armv7-apple-ios/lib/libpanic_unwind-1ae50a28b4422088.rlib" "/Users/travis/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/armv7-apple-ios/lib/libunwind-9d0c322078d36242.rlib" "/Users/travis/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/armv7-apple-ios/lib/liballoc_system-38f6ee9e3fd9fecd.rlib" "/Users/travis/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/armv7-apple-ios/lib/liblibc-24b972b32c966e12.rlib" "/Users/travis/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/armv7-apple-ios/lib/liballoc-63f8d5024acf177e.rlib" "/Users/travis/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/armv7-apple-ios/lib/libcore-c374aa20cea2670e.rlib" "/Users/travis/.rustup/toolchains/nightly-x86_64-apple-darwin/lib/rustlib/armv7-apple-ios/lib/libcompiler_builtins-77935fa2dc76146a.rlib" "-lSystem" "-lobjc" "-framework" "Security" "-framework" "Foundation" "-lresolv" "-lc" "-lm"
  = note: ld: library not found for -lcrt1.3.1.o
          clang: error: linker command failed with exit code 1 (use -v to see invocation)
          
error: aborting due to previous error
error: Could not compile `nbody`.
Caused by:
  process didn't exit successfully: `rustc --crate-name nbody src/main.rs --crate-type bin --emit=dep-info,link -C debuginfo=2 -C metadata=df77513c362d3c5d -C extra-filename=-df77513c362d3c5d --out-dir /Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps --target armv7-apple-ios -C incremental=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/incremental -L dependency=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps -L dependency=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/debug/deps --extern nbody_lib=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/libnbody_lib-9b52a83dc6484f92.rlib --extern packed_simd=/Users/travis/build/rust-lang-nursery/packed_simd/target/nbody/target/armv7-apple-ios/debug/deps/libpacked_simd-3c5c861616491f51.rlib -C codegen-units=1 -C target-feature=+neon` (exit code: 1)

Implement conversions and bitcasts per the RFC

https://github.com/gnzlbg/rfcs/blob/ppv/text/0000-ppv.md#conversions-and-bitcasts:

There are three different ways to convert between vector types.

  • From/Into: value-preserving widening-conversion between vectors with the same number of lanes. That is, f32x4 can be converted into f64x4 using From/Into, but the opposite is not true because that conversion is not value preserving. The From/Into implementations mirror that of the primitive integer and floating-point types. These conversions can widen the size of the element type, and thus the size of the SIMD vector type. Signed vector types are sign-extended lane-wise, while unsigned vector types are zero-extended lane-wise. The result of these conversions is endian-independent.

  • as: non-value preserving truncating-conversions between vectors with the same number of lanes. That is, f64x4 as f32x4 performs a lane-wise as cast, truncating the values if they would overflow the destination type. The result of these conversions is endian-independent.

  • unsafe mem::transmute: bit-casts between vectors with the same size, that is, the vectors do not need to have the same number of lanes. For example, transmuting a u8x16 into a u16x8. Note that while all bit-patterns of the {i,u,f} vector types represent a valid vector value, there are many vector mask bit-patterns that do not represent a valid mask. Note also that the result of unsafe mem::transmute is endian-dependent (see examples below).

As an extension to the RFC, we should implement the FromBits/IntoBits of the RFC as a temporary workaround for the lack of a Compatible trait that allows performing safe transmutes.

Bad compare codegen

Just me or is this bad codegen?

extern crate packed_simd;
use std::arch::x86_64::*;
use packed_simd::*;

pub fn le_i8x16(x: i8x16, y: i8x16) -> bool {
    x.le(y).all()
}
	.section	__TEXT,__text,regular,pure_instructions
	.globl	__ZN9temp_test8le_i8x1617h13e72ae756b29df4E
	.p2align	4, 0x90
__ZN9temp_test8le_i8x1617h13e72ae756b29df4E:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	vmovdqa	(%rdi), %xmm0
	vpcmpgtb	(%rsi), %xmm0, %xmm0
	vpcmpeqd	%xmm1, %xmm1, %xmm1
	vpxor	%xmm1, %xmm0, %xmm0
	vptest	%xmm1, %xmm0
	setb	%al
	popq	%rbp
	retq
	.cfi_endproc

vs

pub unsafe fn int_u8x16(x: __m128i, y: __m128i) -> bool {
    let mask = _mm_cmpgt_epi8(x, y);
    1 == _mm_test_all_zeros(mask, mask)
}
	.globl	__ZN9temp_test9int_u8x1617h00120f998e226c4eE
	.p2align	4, 0x90
__ZN9temp_test9int_u8x1617h00120f998e226c4eE:
	.cfi_startproc
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	vmovdqa	(%rdi), %xmm0
	vpcmpgtb	(%rsi), %xmm0, %xmm0
	vptest	%xmm0, %xmm0
	sete	%al
	popq	%rbp
	retq
	.cfi_endproc

floating-point `sum` and `product` produce incorrect results when NaNs are present

Due to https://bugs.llvm.org/show_bug.cgi?id=36732, wrapping_sum / wrapping_product are implemented with fast-math flags unconditionally enabled, which results in inconsistencies like them returning a NaN for which the nan.is_nan() method returns false...

We'll probably need to work-around these issues here in stdsimd.


Now that LLVM7 has been merged we should be able to disable the fast math flags in rust-lang/rust to fix this issue.

u128xN casts missing

Any particular reason for this?

(It speeds up a component of rand's UniformInt by up to x2. We could implement it with the platform intrinsic but that's not ideal)

packed simd vectors shall not implement PartialOrd

Pros/Cons of implementing PartialOrd

The PartialOrd implementation for vectors has the following disadvantages:

  • It is surprising to new users: new users often do not expect a lexicographical ordering. For example, they might expect < to return true only if all lanes of the resulting mask are true which is a reasonable expectation to have.

  • Another potentially-surprising consequence of implementing a lexicographical ordering is that both &[i32] and &[i32x4] implement PartialOrd and Ord and therefore can be sorted, but the ordering of the elements will be different.

  • If anything I'd see this as an argument in favor of pointwise ordering as it makes such code not compile any more, which is the best possible behavior (better than compiling it but doing something strange).

  • New users would often be better served by rewriting their code to just use the vertical comparisons producing masks.

  • It is slow: it is an horizontal operation, typically requiring two vertical comparisons (e.g. < and ==), and then a scalar loop over the masks, e.g.:

    fn lt(&self, other: &Self) -> bool {
        let m_lt = Self::lt(*self, *other);
        let m_eq = Self::eq(*self, *other);
        for i in 0..$id::lanes() {
            if m_eq.extract(i) {
                continue;
            }
            return m_lt.extract(i);
        }
        false
    }

    Also, there are many generic types that are often instantiated for portable vector types, like &[f32x4] where users would probably be better served by the PartialOrd implementation for &[f32].

The advantages of providing a PartialOrd implementation are:

  • It is not trivial to implement it yourself.
  • TODO: summarize other advantages?

Actionable Improvements

We could significantly reduce the disadvantages while maintaining most of the advantages by:

  • Providing methods on the vector types that implement a partial or total order (returning Ordering and/or Option<Ordering>) without implementing PartialOrd for the portable vector types. This would allow users that need a PartialOrd implementation to easily write a new type wrapper, and use the inherent method to implement partial_cmp or cmp for it. This would also allow us to implement multiple different ordering relations for floating point types, where we could provide a partial_cmp and cmp method implementing a partial and total order, and let the user choose which one they want to use.

  • Provide specializations of PartialOrd for types commonly instantiated with portable vector types, like &[f32x4], where instead of using partial_cmp of f32x4 we just fall back to the PartialOrd implementation of &[f32].

improve packed pointer vector arithmetic

  • it is barely pretty much tested (it is only used in the aobench example tiled implementations)
  • the offset methods are just wrappers over the wrapping_ methods
  • (implement? and) test portable vector shuffles on vectors of pointers

min_element / max_element produce incorrect results for NaNs in the last place

See it live: https://play.rust-lang.org/?gist=2ccd8fb4e41fc1b28a1cc6a2e5774a7b&version=nightly

#![feature(repr_simd)]
#![feature(platform_intrinsics)]

extern "platform-intrinsic" {
    fn simd_reduce_min<T,U>(a: T) -> U;
}

#[repr(simd)]
pub struct F(f32, f32);

pub fn foo(x: F) -> f32 {
   unsafe { simd_reduce_min(x) }
}

fn main() {
 let x = F(1.0, -1.0);
 assert_eq!(foo(x), -1.0);   // OK
 let y = F(std::f32::NAN, -1.0);
 assert_eq!(foo(y), -1.0);   // OK
 let z = F(-1.0, std::f32::NAN);
 assert_eq!(foo(z), -1.0);  // FAILS: returns NaN
}

So I've filled: https://bugs.llvm.org/show_bug.cgi?id=36982

This issue is probably because llvm.vector.reduce.fmin/fmax have the semantics of fcmp instead of minnum/maxnum. We will probably need to workaround this issue in stdsimd.

What we want is the same semantics that Rust specifies for min/max on floating-point primitive types. For min this is:

Returns the minimum of the two numbers. If one of the arguments is NaN, then the other argument is returned.

This looks compatible with the specifications of IEEE754-2018 (drafts are here: http://754r.ucbtest.org/drafts/). From the latest draft (P754-233 17-Mar-2018):

**minimumNumber**(x, y) is x if x<y, y if y<x, and the number if one operand is a number and the
other is a NaN. For this operation, −0 compares less than +0. If x=y and signs are the same it is
either x or y. If both operands are NaNs, a quiet NaN is returned, according to 6.2. If either
operand is a signaling NaN, an invalid operation exception is signaled, but unless both operands
are NaNs, the signaling NaN is otherwise ignored and not converted to a quiet NaN as stated in
6.2 for other operations.

**maximumNumber**(x, y) is x if x>y, y if y>x, and the number if one operand is a number and the
other is a NaN. For this operation, +0 compares greater than −0. If x=y and signs are the same it
is either x or y.If both operands are NaNs, a quiet NaN is returned, according to 6.2. If either
operand is a signaling NaN, an invalid operation exception is signaled, but unless both operands
are NaNs, the signaling NaN is otherwise ignored and not converted to a quiet NaN as stated in
6.2 for other operations

f32x2 vector arithmetic ops fail on x86_64-apple-darwin

I cannot reproduce this locally but the following two tests fail on x86_64-apple-darwin on travis:

---- v64::f32x2_ops_scalar_arith::ops_scalar_arithmetic stdout ----
thread 'v64::f32x2_ops_scalar_arith::ops_scalar_arithmetic' panicked at 'assertion failed: `(left == right)`
  left: `f32x2(NaN, 0.0)`,
 right: `f32x2(0.0, 0.0)`', src/v64.rs:43:1
---- v64::f32x2_ops_vector_arith::ops_vector_arithmetic stdout ----
thread 'v64::f32x2_ops_vector_arith::ops_vector_arithmetic' panicked at 'assertion failed: `(left == right)`
  left: `f32x2(NaN, 0.0)`,
 right: `f32x2(0.0, 0.0)`', src/v64.rs:43:1
failures:
    v64::f32x2_ops_scalar_arith::ops_scalar_arithmetic
    v64::f32x2_ops_vector_arith::ops_vector_arithmetic

Enable MSA on MIPS64 CI

Enabling MSA for the mips64 targets produces the following error:

export RUSTFLAGS= -C target-feature=+msa -C target-cpu=mips64r5
cargo test --target=mips64-unknown-linux-gnuabi64 --release
   Compiling cfg-if v0.1.4
   Compiling nodrop v0.1.12
   Compiling packed_simd v0.1.0 (file:///checkout)
   Compiling arrayvec v0.4.7
LLVM ERROR: Cannot select: 0x7fdaf4d96b60: v2i64 = setcc 0x7fdaf47759c0, 0x7fdaf4d960d0, setlt:ch
  0x7fdaf47759c0: v2f64 = vselect 0x7fdaf4775138, 0x7fdaf479dd00, 0x7fdaf479dc98
    0x7fdaf4775138: v2i64 = setcc 0x7fdaf479dd00, 0x7fdaf479dc98, setlt:ch
      0x7fdaf479dd00: v2f64,ch = load<(load 16 from %stack.29, align 32)> 0x7fdaf46ce6e8, FrameIndex:i64<29>, undef:i64
        0x7fdaf4775000: i64 = FrameIndex<29>
        0x7fdaf479d680: i64 = undef
      0x7fdaf479dc98: v2f64,ch = load<(load 16 from %stack.29 + 16)> 0x7fdaf46ce6e8, 0x7fdaf4775618, undef:i64
        0x7fdaf4775618: i64 = or FrameIndex:i64<29>, Constant:i64<16>
          0x7fdaf4775000: i64 = FrameIndex<29>
          0x7fdaf4d96e38: i64 = Constant<16>
        0x7fdaf479d680: i64 = undef
    0x7fdaf479dd00: v2f64,ch = load<(load 16 from %stack.29, align 32)> 0x7fdaf46ce6e8, FrameIndex:i64<29>, undef:i64
      0x7fdaf4775000: i64 = FrameIndex<29>
      0x7fdaf479d680: i64 = undef
    0x7fdaf479dc98: v2f64,ch = load<(load 16 from %stack.29 + 16)> 0x7fdaf46ce6e8, 0x7fdaf4775618, undef:i64
      0x7fdaf4775618: i64 = or FrameIndex:i64<29>, Constant:i64<16>
        0x7fdaf4775000: i64 = FrameIndex<29>
        0x7fdaf4d96e38: i64 = Constant<16>
      0x7fdaf479d680: i64 = undef
  0x7fdaf4d960d0: v2f64,ch = load<(load 16 from %stack.30)> 0x7fdaf4d96138, FrameIndex:i64<30>, undef:i64
    0x7fdaf46ceaf8: i64 = FrameIndex<30>
    0x7fdaf479d680: i64 = undef
In function: _ZN4core3ops8function6FnOnce9call_once17h22e71f119a5a9c96E
error: Could not compile `packed_simd`.

cc @jcowgill

vector scatters broken on mips 32-bit

The portable vector scatter test fail on mips-unknown-linux-gnu and mipsel-unknown-linux-gnu:

---- vPtr::ptr_mut_x2_write::write stdout ----
thread 'vPtr::ptr_mut_x2_write::write' panicked at 'assertion failed: `(left == right)`
  left: `[0, 1]`,
 right: `[42, 42]`', src/vPtr.rs:10:1

PartialOrd fails on ppc

The following fails on {powerpc,powerpc64,powerpc64le} w/o altivec, vsx.

failures:
---- v128::i128x1_cmp_PartialOrd::partial_ord stdout ----
thread 'v128::i128x1_cmp_PartialOrd::partial_ord' panicked at 'assertion failed: `(left == right)`
  left: `Some(Less)`,
 right: `None`: PartiallyOrdered(i128x1(0)), PartiallyOrdered(i128x1(1))', src/testing/utils.rs:84:5
---- v128::u128x1_cmp_PartialOrd::partial_ord stdout ----
thread 'v128::u128x1_cmp_PartialOrd::partial_ord' panicked at 'assertion failed: `(left == right)`
  left: `Some(Less)`,
 right: `None`: PartiallyOrdered(u128x1(0)), PartiallyOrdered(u128x1(1))', src/testing/utils.rs:84:5
failures:
    v128::i128x1_cmp_PartialOrd::partial_ord
    v128::u128x1_cmp_PartialOrd::partial_ord

Might need to fill a rust-lang/rust / LLVM bug about this. It looks very similar to the current 128-bit wide integer bugs in s390x and sparc ( #75 ).

cc @lu-zero

Incorrect inlining on ppc64le

See rust-lang/stdarch#447 (comment)

disassembly for coresimd::coresimd::powerpc::altivec::sealed::assert_vec_add_bc_sc_vaddubm::vec_add_bc_sc_shim: 
	 0: addis r2,r12,16 
	 1: addi r2,r2,31904 
	 2: mflr r0 
	 3: std r0,16(r1) 
	 4: stdu r1,-160(r1) 
	 5: std r30,144(r1) 
	 6: li r3,128 
	 7: addi r30,r1,112 
	 8: stxvd2x vs63,r1,r3 
	 9: mr r3,r30 
	10: vmr v31,v3 
	11: bl 10220 <_ZN65_$LT$T$u20$as$u20$coresimd..coresimd..ppsv..IntoBits$LT$U$GT$$GT$9into_bits17h7dd9bb155da76093E> 
	12: lvx v2,0,r30 
	13: li r3,128 
	14: ld r30,144(r1) 
	15: vaddubm v2,v2,v31 
	16: lxvd2x vs63,r1,r3 
	17: addi r1,r1,160 
	18: ld r0,16(r1) 
	19: mtlr r0 
	20: blr 
	21: 
thread 'coresimd::powerpc::altivec::sealed::assert_vec_add_bc_sc_vaddubm' panicked at 'instruction found, but the disassembly contains too many instructions: #instructions = 22 >= 20 (limit)', crates/stdsimd-test/src/lib.rs:385:9

@alexcrichton wrote:

target triple = "powerpc64le-unknown-linux-gnu"

define internal void @foo(i32*, i32* %self) {
start:
  %1 = load i32, i32* %self
  store i32 %1, i32* %0
  ret void
}

define void @bar(i32* %a, i32* %b) #0 {
start:
  tail call void @foo(i32* %a, i32* %b)
  ret void
}

attributes #0 = { "target-features"="+altivec" }

That's the fully optimized IR and it won't optimize any further. I don't know enough about PowerPC to know whether this is an LLVM bug or not.


@lu-zero wrote:

The problem is more widespread:

This:

    #[inline]
    #[target_feature(enable = "altivec")]
    #[cfg_attr(test, assert_instr(vmladduhm))]
    unsafe fn mladd(a: i16x8, b: i16x8, c: i16x8) -> i16x8 {
        a * b + c
    }

Gets compiled to this:

	 0: addis r2,r12,16
	 1: addi r2,r2,8128
	 2: mflr r0
	 3: std r0,16(r1)
	 4: stdu r1,-160(r1)
	 5: li r3,128
	 6: addis r4,r2,-5
	 7: std r30,144(r1)
	 8: li r5,32
	 9: addi r30,r1,112
	10: stxvd2x vs63,r1,r3
	11: nop
	12: addi r4,r4,-30982
	13: vmr v31,v4
	14: addi r3,r2,-32496
	15: std r4,0(r3)
	16: std r5,8(r3)
	17: mr r3,r30
	18: bl 15b20 <_ZN79_$LT$coresimd..coresimd..ppsv..v128..i16x8$u20$as$u20$core..ops..arith..Mul$GT$3mul17h7c18739225a18fb1E>
	19: vmr v3,v31
	20: lvx v2,0,r30
	21: mr r3,r30
	22: bl 15b00 <_ZN79_$LT$coresimd..coresimd..ppsv..v128..i16x8$u20$as$u20$core..ops..arith..Add$GT$3add17h9568264c0290a004E>
	23: li r3,128
	24: lvx v2,0,r30
	25: ld r30,144(r1)
	26: lxvd2x vs63,r1,r3
	27: addi r1,r1,160
	28: ld r0,16(r1)
	29: mtlr r0
	30: blr
	31:

and that gets it to fail generating the expected instruction.

force-inlining fixes it.

Portable shuffles with run-time indices

There is some support on ARM and x86 for shuffles with run-time indices for some vector types:

  • arm / aarch64: NEON has vector table lookup intrinsics: vtbl{1,2,3,4}
  • x86 / x86_64: SSSE3 pshufb, AVX: vpermilp{s,d}, vperm2f128, AVX2: vpshufb, AVX2/AVX-512F: vperm{d,s}, AVX-512F+VL: vpermt2d

We should at least consider whether offering a portable way of abstracting over these is worth pursuing. The API could look like this:

impl {element_type}{element_width}x{number_of_lanes} {
    fn shuffle(self, indices: Self) -> Self {
        let mut result = Default::default();
        for i in 0..Self::lanes() {
            result = result.replace(i, self.extract(indices.extract(i) as usize));
        }
        result
    }
}

Independently of how good code-generation is for this (we can always file bugs and add our own workarounds here and there), would something like this be worth pursuing?

We could also scale this down to a shuffle_bytes portable operation that is only implemented for unsigned integer vector types (uMxN) since that is what appears to be most widely supported (we could always revisit the full shuffles with run-time indices in the future).

cc @sunfish @TheIronBorn @rkruppe

fine tune how to split the test suite

The test suite was split in #62 into three test groups:

RUSTFLAGS="${ORIGINAL_RUSTFLAGS} --cfg test_v16  --cfg test_v32" cargo_test ${1}
RUSTFLAGS="${ORIGINAL_RUSTFLAGS} --cfg test_v64  --cfg test_v128" cargo_test ${1}
RUSTFLAGS="${ORIGINAL_RUSTFLAGS} --cfg test_v256 --cfg test_v512" cargo_test ${1}

We should fine tune this, and see if a different split, for example:

RUSTFLAGS="${ORIGINAL_RUSTFLAGS} --cfg test_v16  --cfg test_v32 --cfg test_v64" cargo_test ${1}
RUSTFLAGS="${ORIGINAL_RUSTFLAGS} --cfg test_v128 --cfg test_v256" cargo_test ${1}
RUSTFLAGS="${ORIGINAL_RUSTFLAGS} --cfg test_v512" cargo_test ${1}

performs faster on travis.


The following things have already been tried:

  • full split (one--cfg per command into 6 commands) and that performed worse (~55min for the testsuite)
  • a split into two (16,32,64 + 128,256,512) but the memory consumption of that job was too large and the ios build triggered rust-lang/rust#52699.

So a split into three commands might be just what we need.

aarch64 and arm linux-android targets fail to compile on Travis-CI with SIGKILL9

The arm-linux-androideabi and aarch64-linux-android targets fail to compile packed_simd with its tests with the following error messages.

signal: 9, SIGKILL: kill is probably the OOM killer in action.

arm-linux-androideabi

https://travis-ci.org/rust-lang-nursery/packed_simd/jobs/411694486

+ cargo_test
+ cmd='cargo test --verbose --target=arm-linux-androideabi '
+ mkdir target
mkdir: cannot create directory 'target': File exists
+ true
+ cargo test --verbose --target=arm-linux-androideabi
+ tee
Home directory not accessible: Permission denied
pulseaudio: pa_context_connect() failed
pulseaudio: Reason: Connection refused
pulseaudio: Failed to initialize PA contextaudio: Could not init `pa' audio driver
emulator: WARNING: userdata partition is resized from 550 M to 800 M
emulator: WARNING: encryption is off
Your emulator is out of date, please update by launching Android Studio:
 - Start Android Studio
 - Select menu "Tools > Android > SDK Manager"
 - Click "SDK Tools" tab
 - Check "Android Emulator" checkbox
 - Click "OK"
+ [[ 101 != 0 ]]
+ cat target/output
 Downloading arrayvec v0.4.7
 Downloading cfg-if v0.1.4
 Downloading interpolate_idents v0.2.5
 Downloading nodrop v0.1.12
   Compiling cfg-if v0.1.4
     Running `rustc --crate-name cfg_if /cargo/registry/src/github.com-1ecc6299db9ec823/cfg-if-0.1.4/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 -C metadata=fcb79f4f267db7cf -C extra-filename=-fcb79f4f267db7cf --out-dir /checkout/target/arm-linux-androideabi/debug/deps --target arm-linux-androideabi -C linker=arm-linux-androideabi-gcc -L dependency=/checkout/target/arm-linux-androideabi/debug/deps -L dependency=/checkout/target/debug/deps --cap-lints allow -C codegen-units=1`
   Compiling nodrop v0.1.12
     Running `rustc --crate-name nodrop /cargo/registry/src/github.com-1ecc6299db9ec823/nodrop-0.1.12/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 -C metadata=66f8d4eda832dded -C extra-filename=-66f8d4eda832dded --out-dir /checkout/target/arm-linux-androideabi/debug/deps --target arm-linux-androideabi -C linker=arm-linux-androideabi-gcc -L dependency=/checkout/target/arm-linux-androideabi/debug/deps -L dependency=/checkout/target/debug/deps --cap-lints allow -C codegen-units=1`
   Compiling interpolate_idents v0.2.5
     Running `rustc --crate-name interpolate_idents /cargo/registry/src/github.com-1ecc6299db9ec823/interpolate_idents-0.2.5/src/lib.rs --crate-type dylib --emit=dep-info,link -C prefer-dynamic -C debuginfo=2 -C metadata=8843e94d5409e340 -C extra-filename=-8843e94d5409e340 --out-dir /checkout/target/debug/deps -L dependency=/checkout/target/debug/deps --cap-lints allow`
   Compiling packed_simd v0.1.0 (file:///checkout)
     Running `rustc --crate-name packed_simd src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="default"' -C metadata=1b918436c5fe8b47 -C extra-filename=-1b918436c5fe8b47 --out-dir /checkout/target/arm-linux-androideabi/debug/deps --target arm-linux-androideabi -C linker=arm-linux-androideabi-gcc -C incremental=/checkout/target/arm-linux-androideabi/debug/incremental -L dependency=/checkout/target/arm-linux-androideabi/debug/deps -L dependency=/checkout/target/debug/deps --extern cfg_if=/checkout/target/arm-linux-androideabi/debug/deps/libcfg_if-fcb79f4f267db7cf.rlib -C codegen-units=1`
   Compiling arrayvec v0.4.7
     Running `rustc --crate-name arrayvec /cargo/registry/src/github.com-1ecc6299db9ec823/arrayvec-0.4.7/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 -C metadata=5c0b4dc1c0b7d224 -C extra-filename=-5c0b4dc1c0b7d224 --out-dir /checkout/target/arm-linux-androideabi/debug/deps --target arm-linux-androideabi -C linker=arm-linux-androideabi-gcc -L dependency=/checkout/target/arm-linux-androideabi/debug/deps -L dependency=/checkout/target/debug/deps --extern nodrop=/checkout/target/arm-linux-androideabi/debug/deps/libnodrop-66f8d4eda832dded.rlib --cap-lints allow -C codegen-units=1`
     Running `rustc --crate-name endianness tests/endianness.rs --emit=dep-info,link -C debuginfo=2 --test --cfg 'feature="default"' -C metadata=f1be88d9ca1d4ea7 -C extra-filename=-f1be88d9ca1d4ea7 --out-dir /checkout/target/arm-linux-androideabi/debug/deps --target arm-linux-androideabi -C linker=arm-linux-androideabi-gcc -C incremental=/checkout/target/arm-linux-androideabi/debug/incremental -L dependency=/checkout/target/arm-linux-androideabi/debug/deps -L dependency=/checkout/target/debug/deps --extern arrayvec=/checkout/target/arm-linux-androideabi/debug/deps/libarrayvec-5c0b4dc1c0b7d224.rlib --extern cfg_if=/checkout/target/arm-linux-androideabi/debug/deps/libcfg_if-fcb79f4f267db7cf.rlib --extern interpolate_idents=/checkout/target/debug/deps/libinterpolate_idents-8843e94d5409e340.so --extern packed_simd=/checkout/target/arm-linux-androideabi/debug/deps/libpacked_simd-1b918436c5fe8b47.rlib -C codegen-units=1`
     Running `rustc --crate-name packed_simd src/lib.rs --emit=dep-info,link -C debuginfo=2 --test --cfg 'feature="default"' -C metadata=e66949b9438d1b04 -C extra-filename=-e66949b9438d1b04 --out-dir /checkout/target/arm-linux-androideabi/debug/deps --target arm-linux-androideabi -C linker=arm-linux-androideabi-gcc -C incremental=/checkout/target/arm-linux-androideabi/debug/incremental -L dependency=/checkout/target/arm-linux-androideabi/debug/deps -L dependency=/checkout/target/debug/deps --extern arrayvec=/checkout/target/arm-linux-androideabi/debug/deps/libarrayvec-5c0b4dc1c0b7d224.rlib --extern cfg_if=/checkout/target/arm-linux-androideabi/debug/deps/libcfg_if-fcb79f4f267db7cf.rlib --extern interpolate_idents=/checkout/target/debug/deps/libinterpolate_idents-8843e94d5409e340.so -C codegen-units=1`
error: Could not compile `packed_simd`.
Caused by:
  process didn't exit successfully: `rustc --crate-name packed_simd src/lib.rs --emit=dep-info,link -C debuginfo=2 --test --cfg 'feature="default"' -C metadata=e66949b9438d1b04 -C extra-filename=-e66949b9438d1b04 --out-dir /checkout/target/arm-linux-androideabi/debug/deps --target arm-linux-androideabi -C linker=arm-linux-androideabi-gcc -C incremental=/checkout/target/arm-linux-androideabi/debug/incremental -L dependency=/checkout/target/arm-linux-androideabi/debug/deps -L dependency=/checkout/target/debug/deps --extern arrayvec=/checkout/target/arm-linux-androideabi/debug/deps/libarrayvec-5c0b4dc1c0b7d224.rlib --extern cfg_if=/checkout/target/arm-linux-androideabi/debug/deps/libcfg_if-fcb79f4f267db7cf.rlib --extern interpolate_idents=/checkout/target/debug/deps/libinterpolate_idents-8843e94d5409e340.so -C codegen-units=1` (signal: 9, SIGKILL: kill)
+ return 1
/home/travis/.travis/job_stages: line 190:  3258 Terminated              travis_jigger $! $timeout $cmd

aarch64-linux-android

https://travis-ci.org/rust-lang-nursery/packed_simd/jobs/411694487

+ cargo_test
+ cmd='cargo test --verbose --target=aarch64-linux-android '
+ mkdir target
mkdir: cannot create directory 'target': File exists
+ true
+ cargo test --verbose --target=aarch64-linux-android
+ tee
Home directory not accessible: Permission denied
pulseaudio: pa_context_connect() failed
pulseaudio: Reason: Connection refused
pulseaudio: Failed to initialize PA contextaudio: Could not init `pa' audio driver
emulator: WARNING: userdata partition is resized from 550 M to 800 M
emulator: WARNING: encryption is off
Your emulator is out of date, please update by launching Android Studio:
 - Start Android Studio
 - Select menu "Tools > Android > SDK Manager"
 - Click "SDK Tools" tab
 - Check "Android Emulator" checkbox
 - Click "OK"
+ [[ 101 != 0 ]]
+ cat target/output
+ return 1
 Downloading arrayvec v0.4.7
 Downloading cfg-if v0.1.4
 Downloading interpolate_idents v0.2.5
 Downloading nodrop v0.1.12
   Compiling cfg-if v0.1.4
     Running `rustc --crate-name cfg_if /cargo/registry/src/github.com-1ecc6299db9ec823/cfg-if-0.1.4/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 -C metadata=fcb79f4f267db7cf -C extra-filename=-fcb79f4f267db7cf --out-dir /checkout/target/aarch64-linux-android/debug/deps --target aarch64-linux-android -C linker=aarch64-linux-android-gcc -L dependency=/checkout/target/aarch64-linux-android/debug/deps -L dependency=/checkout/target/debug/deps --cap-lints allow -C codegen-units=1`
   Compiling nodrop v0.1.12
     Running `rustc --crate-name nodrop /cargo/registry/src/github.com-1ecc6299db9ec823/nodrop-0.1.12/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 -C metadata=66f8d4eda832dded -C extra-filename=-66f8d4eda832dded --out-dir /checkout/target/aarch64-linux-android/debug/deps --target aarch64-linux-android -C linker=aarch64-linux-android-gcc -L dependency=/checkout/target/aarch64-linux-android/debug/deps -L dependency=/checkout/target/debug/deps --cap-lints allow -C codegen-units=1`
   Compiling interpolate_idents v0.2.5
     Running `rustc --crate-name interpolate_idents /cargo/registry/src/github.com-1ecc6299db9ec823/interpolate_idents-0.2.5/src/lib.rs --crate-type dylib --emit=dep-info,link -C prefer-dynamic -C debuginfo=2 -C metadata=8843e94d5409e340 -C extra-filename=-8843e94d5409e340 --out-dir /checkout/target/debug/deps -L dependency=/checkout/target/debug/deps --cap-lints allow`
   Compiling packed_simd v0.1.0 (file:///checkout)
     Running `rustc --crate-name packed_simd src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 --cfg 'feature="default"' -C metadata=1b918436c5fe8b47 -C extra-filename=-1b918436c5fe8b47 --out-dir /checkout/target/aarch64-linux-android/debug/deps --target aarch64-linux-android -C linker=aarch64-linux-android-gcc -C incremental=/checkout/target/aarch64-linux-android/debug/incremental -L dependency=/checkout/target/aarch64-linux-android/debug/deps -L dependency=/checkout/target/debug/deps --extern cfg_if=/checkout/target/aarch64-linux-android/debug/deps/libcfg_if-fcb79f4f267db7cf.rlib -C codegen-units=1`
   Compiling arrayvec v0.4.7
     Running `rustc --crate-name arrayvec /cargo/registry/src/github.com-1ecc6299db9ec823/arrayvec-0.4.7/src/lib.rs --crate-type lib --emit=dep-info,link -C debuginfo=2 -C metadata=5c0b4dc1c0b7d224 -C extra-filename=-5c0b4dc1c0b7d224 --out-dir /checkout/target/aarch64-linux-android/debug/deps --target aarch64-linux-android -C linker=aarch64-linux-android-gcc -L dependency=/checkout/target/aarch64-linux-android/debug/deps -L dependency=/checkout/target/debug/deps --extern nodrop=/checkout/target/aarch64-linux-android/debug/deps/libnodrop-66f8d4eda832dded.rlib --cap-lints allow -C codegen-units=1`
     Running `rustc --crate-name endianness tests/endianness.rs --emit=dep-info,link -C debuginfo=2 --test --cfg 'feature="default"' -C metadata=f1be88d9ca1d4ea7 -C extra-filename=-f1be88d9ca1d4ea7 --out-dir /checkout/target/aarch64-linux-android/debug/deps --target aarch64-linux-android -C linker=aarch64-linux-android-gcc -C incremental=/checkout/target/aarch64-linux-android/debug/incremental -L dependency=/checkout/target/aarch64-linux-android/debug/deps -L dependency=/checkout/target/debug/deps --extern arrayvec=/checkout/target/aarch64-linux-android/debug/deps/libarrayvec-5c0b4dc1c0b7d224.rlib --extern cfg_if=/checkout/target/aarch64-linux-android/debug/deps/libcfg_if-fcb79f4f267db7cf.rlib --extern interpolate_idents=/checkout/target/debug/deps/libinterpolate_idents-8843e94d5409e340.so --extern packed_simd=/checkout/target/aarch64-linux-android/debug/deps/libpacked_simd-1b918436c5fe8b47.rlib -C codegen-units=1`
     Running `rustc --crate-name packed_simd src/lib.rs --emit=dep-info,link -C debuginfo=2 --test --cfg 'feature="default"' -C metadata=e66949b9438d1b04 -C extra-filename=-e66949b9438d1b04 --out-dir /checkout/target/aarch64-linux-android/debug/deps --target aarch64-linux-android -C linker=aarch64-linux-android-gcc -C incremental=/checkout/target/aarch64-linux-android/debug/incremental -L dependency=/checkout/target/aarch64-linux-android/debug/deps -L dependency=/checkout/target/debug/deps --extern arrayvec=/checkout/target/aarch64-linux-android/debug/deps/libarrayvec-5c0b4dc1c0b7d224.rlib --extern cfg_if=/checkout/target/aarch64-linux-android/debug/deps/libcfg_if-fcb79f4f267db7cf.rlib --extern interpolate_idents=/checkout/target/debug/deps/libinterpolate_idents-8843e94d5409e340.so -C codegen-units=1`
error: Could not compile `packed_simd`.
Caused by:
  process didn't exit successfully: `rustc --crate-name packed_simd src/lib.rs --emit=dep-info,link -C debuginfo=2 --test --cfg 'feature="default"' -C metadata=e66949b9438d1b04 -C extra-filename=-e66949b9438d1b04 --out-dir /checkout/target/aarch64-linux-android/debug/deps --target aarch64-linux-android -C linker=aarch64-linux-android-gcc -C incremental=/checkout/target/aarch64-linux-android/debug/incremental -L dependency=/checkout/target/aarch64-linux-android/debug/deps -L dependency=/checkout/target/debug/deps --extern arrayvec=/checkout/target/aarch64-linux-android/debug/deps/libarrayvec-5c0b4dc1c0b7d224.rlib --extern cfg_if=/checkout/target/aarch64-linux-android/debug/deps/libcfg_if-fcb79f4f267db7cf.rlib --extern interpolate_idents=/checkout/target/debug/deps/libinterpolate_idents-8843e94d5409e340.so -C codegen-units=1` (signal: 9, SIGKILL: kill)
/home/travis/.travis/job_stages: line 190:  3265 Terminated              travis_jigger $! $timeout $cmd

Basic iteration support / from_slice_* performance & safety.

Not sure how much of this should be handled through an RFC (or 3rd party libraries), but I feel basic iteration support could increase performance, safety and ergonomics of std::simd:

To iterate over a slice today, you have to call one of the from_slice_* methods. Either from_slice_aligned, which will do internal sanity checks per call, or from_slice_aligned_unchecked, which will just reinterpret the slice.

Comparing them in a real-world application gives me these performance numbers:

fn compute_inner_kernel_simdf32x8(sv: &[f32], feature: &[f32], gamma: f32) -> f64 {
    type f32s = f32x8;

    let width = f32s::lanes();
    let steps = sv.len() / width;

    let mut sum = f32s::splat(0.0);

    for i in 0..steps {
        // When benchmarking `csvm_predict_sv1024_attr1024_problems1` with AVX2:

        // 238,928 ns / iter
        let a = unsafe { f32s::from_slice_aligned_unchecked(&sv[i * width..]) };
        let b = unsafe { f32s::from_slice_aligned_unchecked(&feature[i * width..]) };

        // 237,541 ns / iter
        // let a = unsafe { f32s::from_slice_unaligned_unchecked(&sv[i * width..]) };
        // let b = unsafe { f32s::from_slice_unaligned_unchecked(&feature[i * width..]) };

        // 343,970 ns / iter
        // let a = f32s::from_slice_aligned(&sv[i * width..]);
        // let b = f32s::from_slice_aligned(&feature[i * width..]);

        // 363,796 ns / iter
        // let a = f32s::from_slice_unaligned(&sv[i * width..]);
        // let b = f32s::from_slice_unaligned(&feature[i * width..]);

        // Add result
        sum += (a - b) * (a - b);
    }

    f64::from((-gamma * sum.sum()).exp())
}
  

In other words, using the safe checked version is at least 50% slower than the unsafe unchecked ones.

I would prefer having a safe version, that is about as fast as the unchecked one.

I think this could be achieved by having basic [f32] (and similar) iterators. Nothing as sophisticated as faster that transparently handles partials, but maybe something like f32x8::iterator(my_slice) that would perform all sanity checks on the first call, and then (without further checks) just generates all subsequent elements.

LLVM floating-point math intrinsics fail on s390x-unknown-linux-gnu

On s390x-unknown-linux-gnu fails to compile due to errors in the following floating-point vector functions:

  • abs
  • cos
  • fma
  • sin
  • sqrt

The errors are all of the form:

Intrinsic has incorrect return type!
void (<16 x float>*, <16 x float>*, <16 x float>*, <16 x float>*)* @llvm.fma.v16f32
LLVM ERROR: Broken function found, compilation aborted!

This is currently worked around by falling back to scalar code.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.