Giter VIP home page Giter VIP logo

umesimd's Introduction

NOTE: UME::Vector library has been moved to github! Please see: https://github.com/edanor/umevector

Build Status

Current stable release is: v0.6.1-stable
To checkout stable release use:

git clone https://[email protected]/edanor/umesimd.git
git checkout tags/v0.6.1-stable

UME::SIMD is an explicit vectorization library. The library defines homogeneous interface for accessing functionality of SIMD registers of AVX, AVX2, AVX512 and IMCI (KNCNI, k1om) instruction set.

Draft of the UME::SIMD specification: UME::SIMD spec

This piece of code was developed as part of ICE-DIP project at CERN.

"ICE-DIP is a European Industrial Doctorate project funded by the European Community's 7th Framework programme Marie Curie Actions under grant PITN-GA-2012-316596".

All questions should be submitted using the bug tracking system:

bug tracker

or by sending e-mail to:

[email protected]

Please refer to the wiki for introduction and additional information:

wiki pages

RELEASE NOTES for v0.6.1-stable
Interface:

  • Add GATHERU/SCATTERU (uniform stride gather/scatter
  • Add operators for LSH/RSH
  • Add LAND/LOR (logical AND/OR) for integer types
  • Add REM (division reminder) operation for integer types
  • Add SORTA/D (sort using Ascending/Descending order)
  • Add BANDNOT/LANDNOT (Bitwise/Logical AND-NOT)
  • Add COPYSIGN to interface
  • Allow SET-CONSTR to use scalar types other than SIMDVec base type
  • Add SCALAR_FLOAT_T to traits classes

Performance tuning:

  • Add generalized, vectorized EXP
  • Add generalized, vectorized implementation for LOG/MLOG
    AVX:
  • add specializations for EXP & MEXP
  • SIMD8_32/64f: Use SVML for sin/cos functions
    AVX2:
  • Add specializations for EXP & MEXP
    AVX512:
  • 32/64 u/i/f : M/GATHERU, M/SCATTERU
  • 32/64 u/i/f : M/GATHERS/V, M/SCATTERS/V
  • 32 u: GATHER/SCATTER
  • Add specialization for EXP
  • Specialized SIN/COS/SINCOS
  • force inline on MFI functions.
  • Bulk update (SIN/COS/EXP/LOG/LSH/RSH/FLOOR/CEIL)
  • Bulk update mask types.
  • Bulk performance upgrade.
    OPENMP:
  • Added openmp plugin. The plugin can be forced by -DFORCE_OPENMP compilation flag

Benchmarks:

  • Increase benchmark automation by adding Makefiles and testing scripts
  • Add compilation with forced scalar plugin
  • Refactor microbenchmark codes to remove g++/clang++ warnings
  • Create separate directories for all benchmarks
  • Add matrix multiplication (matmul) benchmark (WIP)
  • Add placeholder for openmp based implementation for mandelbrot2
  • Add 'explog' microbenchmark to test EXP and LOGx operations

Fixes

  • AVX: Missing 'const' qualifier in operator-
  • AVX2: Fixes for failing tests: FTOU, SADD, SUBFROM
  • AVX: IMIN/IMAX fix
  • Fix MIMAX/MIMIN errors
  • AVX512: SIMD16_32u: fixes for DIV
  • Scalar: force inline on plugin functions
  • AVX512: Fixes for failsing MIN/MAX tests
  • AVX512: GATHER/SCATTER fails
  • COPYSIGN - scalar kernel updated
  • Fix some build problems when using OpenMP plugin
  • Assignment operators (+=, -=, /=) returning value and not reference
  • AVX512: SIMD2_64f incorrect implementation for EXP

Tests

  • Use SVML when building with ICC
  • Use random generated data sets for LSH/RSH
  • Fix alignment problem in GATHER/SCATTER tests
  • Add tests for IMIN/IMAX
  • Replace Cmake build with makefile. Enable parallel compilation
  • Add VS2015 solution for unit tests
  • Add test for SWIZZLE
  • Add tests for PACK, PACKLO, PACKHI, UNPACK, UNPACKLO, UNPACKHI
  • Allow building with OpenMP plugins

Internal code:

  • Move emulation warnings from scalar emulation functions to interface methods

RELEASE NOTES for v0.5.1-stable
Interface:

  • Fix function name for mask interface LOAD.
  • Inverse logic for BLEND operations.
  • Add Non-temporal load/store operations (SSTORE/SLOAD).

Performance tuning:
scalar:

  • Add specialized implementation for SIMD4/8x32.
    AVX:
  • SIMD4_64f + SIMDMask4
  • simplify ABS
  • AVX: enable performance for SIMDx_64
    AVX2:
  • SIMD4_64f + SIMDMask4
    AVX512:
  • AVX512: MASK4 + MASK8 add missing operators.
  • SIMD1_64f TRUNC/MTRUNC
  • SIMD16_64x

Benchmarks:

  • Add QuadraticSolver microbenchmark.
  • Add 'SINCOS' benchmark.
  • Update displayed information.
  • Modifications to prohibit streaming-stores optimization.

Fixes:

  • KNC: add missing 'const' function qualifiers.
  • Incorrect mask used for write mask operator.
  • AVX: Incorrect logic for CMPLT
  • SIMD8_64f replace fast reciprocal with precise one.
  • AVX2: incorrect intermediate mask used when '()' enabled
  • FIX: Saturated addition scalar emulation kernel.
  • AVX: SIMD16_64f - use unaligned load instructions.
  • AVX: FTOI - use C++ compatible conversions.
  • AVX: ROUND - use double precision version of std::round
  • AVX: SIMD4_64f: CMPEQ/CMPNE incorrect masks returned.
  • AVX2: fix unitialized memory bug in avx2 hland function
  • Fix compilation errors using GCC/Clang
  • AVX512: Incorrect kernels for pack/unpack

Tests:

  • Use random generated tests for badly defined scenarios.
  • Add unit tests for MLOAD/MSTORE

Internal code:

  • Force inlining on interface and emulation. New defines: UME_FORCE_INLINE, UME_NEVER_INLINE
  • Remove declspec from interface emulation.
  • Add template specialization forward declarations.
  • Split emulation into pure scalar and vector based.
  • Propagate scalar emulation changes to plugins.

RELEASE NOTES for v0.4.1-stable

Interface:

  • Faster ROL/ROR emulation using LOAD/STORE

  • Aliases for vector types. Now possible to use SIMDVec<BASE_T, VEC_LEN> instead of SIMDVec_u/i/f<BASE_T, VEC_LEN>

  • Added non-member function interface. It is now possible to do:

      add(vec_a, vec_b);
    

instead of:

    vec_a.add(vec_b);   

Performance tuning:

  • Major updates for AVX, AVX2 and AVX512.

Benchmarks:

Fixes:

  • KNC: add missing 'const' function qualifiers.
  • KNL: MULV - incorrect temporaries.
  • Fix compilation warnings with -Wall (GCC/ICC).
  • Fix multiple errors in unit test data sets.
  • Fix narrowing conversion errors in unit test data sets.

Examples:

  • Add example using scalar constant literals in templates.

RELEASE NOTES for v0.3.2-stable

Interface:

  • reintroduced mask-assignment operations on masks
  • gather scatter using scalar types of correlated precision

Performance tuning: AVX:

  • performance improvements: SIMDMask4, SIMD2_32x, SIMD4_32x, SIMD8_32x AVX2:
  • performance improvements: SIMDMask4, SIMD2_32x, SIMD4_32x, SIMD8_32x AVX512:
  • missing operators SIMD4_32u
  • performance improvements: SIMD4_32f

Benchmarks:

  • extend benchmarks with uniform statistics
  • statistics calculate also 90% and 95% confidence intervals

RELEASE NOTES for v0.3.1-stable

Interface:

  • added PROMOTE/DEGRADE operations to convert between vectors using scalars of different precision (e.g. PROMOTE SIMD4_32f to SIMD4_64f)
  • added LOG, LOG2, LOG10 to floating point interface
  • added CMPEQS/CMPNES for masks
  • added compilation flag to switch between '[]' and '()' syntax for writemasks
  • added overloaded operators for mixed scalar<->vector operations (Issue #25)
  • added missing operator= (ISSUE #26)
  • added writemask operators for scalars (e.g. vec[mask] = scalar)

Performance tuning:

  • AVX512 (SKX + KNL): SIMDMask4/8/32, SIMD4_32u/i/f, SIMD8_32u/i/f, SIMD32_32u/i/f, (extensive update)

Bug fixes:

  • AVX2: SIMD4_32f missing 'const' qualifier in STORE
  • AVX: add missing definitions for float vectors
  • AVX512: separation between different AVX512 ISA variations
  • AVX512: AVX512: missing explicit in constructor of SIMDMask<8>

Examples:

  • added basic example for SIMD vector showing MFI(Member Function Interface) and operator syntax.

Tests:

  • added generic unit tests for SIMD using 64f, 8u/i and 16u/i scalar types

Benchmarks:

  • added Latencies benchmark allowing monitoring of library performance with instruction-level granularity
  • added Mandelbrot Set benchmark

Internal code:

  • fixed missing template<> for templated cast operators
  • extended NullTypes and eliminate SIMD1 template specializations; this change simplifies the plugin system and fixes loose ends of the typeset system;
  • remove 'final' class specifiers to allow custom extensions of SIMD types; this change allows using SIMD types as base classes for custom vectorization interfaces;

umesimd's People

Contributors

edanor avatar marehr avatar sawenzel avatar noma avatar

Watchers

 avatar James Cloos avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.