Giter VIP home page Giter VIP logo

avx-analysis's Introduction

AVX Performance Analysis

This is intended to check the speedup for the AVX2 extensions in the intel processor. This will also work for AVX(1), ones simply has to node that AVX2 has 256 bit registers and the first AVX had 128.

AVX Registers Overview

AVX2 supports 256-bit operations. This means that AVX-enabled units can operate on a combination of:

  • 4 x 64 bit operands
  • 2 x 64 bit + 4 x 32 bit operands
  • 32 x 8 bit operands
  • etc...

As long as these can be packed to a 256 bit register, they will be executed using a single instruction when using the AVX2 operands.

Performance Analysis

TLDR;

Have the client choose the size N of each data element (1, 2, 4, and 8 bytes). Perform operations on an array of elements of size N, such that they can be packed into a 256 bit register. Client must make sure to run this code in a single-core CPU. This can be set under the BIOS options of the computer for intel processors. To see the differences in speed, the code must be compiled to use AVX and to not use AVX.

Preparing your computer

Make sure you are running on a single core. Running on a single core can be done by going into the BIOS options during startup. This will provide the best results by ensuring control of the experiment. Compile two versions of the code with GCC:

  • gcc -o avx_enabled.app -O3 -march=native avx_performance.c
  • gcc -o avx_disabled.app -O3 -march=native -mno-sse4 avx_performance.c

These ensure optimization for your cpu, one with AVX and one without.

If you are unsure if your CPU supports AVX, feel free to look at the assembly code and try to identify VADDPD instructions in assembly after creating it with:

  • gcc -o avx_enabled.asm -O3 -march=native -S avx_performance.c

Running the code

Once compiled, the code can be run with the following command: avx_enabled <element_size> <operands> / avx_disabled <element_size> <operands>, where element size is the size of each element (1, 2, 4, or 8) and operands is the amount of operands to perform addition on.

For example you add four 8 byte elements, this can be done in one instruction for AVX2 with the following commands:

  • avx_enabled 8 4
  • avx_disabled 8 4

Finally, compare the execution times output. Feel free to repeat with several parameters to see how the register talks.

avx-analysis's People

Contributors

josecc avatar

Stargazers

 avatar

Watchers

James Cloos avatar  avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.