Giter VIP home page Giter VIP logo

optimized_avx2_quicksort's Introduction

optimized_AVX2_quicksort

A quicksort using SIMD instructions to optimize performance.

This is just a fun experiment to see what can be achieved pulling out all the stops. This project is still a work in progress.

There's a temporary generic sort in there just to check if the rest of the code works until I figure out how to get the SIMD instructions to behave with arrays not in multiples of 8. Also I'm pretty sure there's something wrong with the mask calculations.

Proper decription (also found in main.rs):

It uses AVX2 SIMD instructions to perform partitioning.

  • The avx2_partition function takes a mutable slice of 32-bit integers and a pivot value, and uses SIMD instructions to partition the slice into two sub-slices: one with elements less than the pivot, and one with elements greater than or equal to the pivot.

  • The function makes use of AVX2 instructions like _mm256_cmpgt_epi32 and _mm256_movemask_ps to extract the comparison results from the SIMD registers and update the indices for the two sub-slices. Ideally.

  • It uses loop unrolling and prefetching to reduce memory latency and improve cache utilization.

  • The quicksort function checks the length of the slice and decides whether to use Quicksort or a simple insertion sort for small arrays.

  • For larger arrays, it selects the pivot value and partitions the array using the avx2_partition function. It then recursively sorts the two sub-slices using Quicksort.

  • To improve performance, the function unrolls the recursion loop and prefetches data into the CPU cache, reducing memory latency and improving cache utilization.

Why use unsafe? This program uses unsafe to directly manipulate memory and call CPU-specific instructions.

  • The avx2_partition function uses unsafe Rust to directly load and store data from memory using AVX2 instructions.

  • It also uses the unreachable_unchecked function to indicate to the compiler that a particular code path is unreachable, allowing it to generate more efficient machine code.

  • This implementation is highly optimized for performance, but it also makes use of advanced features and techniques that may not be appropriate or safe in all contexts.

optimized_avx2_quicksort's People

Contributors

kelly-gibson avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.