Parallel Sort Code in C++

The code provided in this repository implements a parallel sort algorithm that follows the same conventions as std::sort(). It works well on large data sets, and performance improves with any number of threads up to the point where memory bandwidth is the limit. It works with arrays and std::vectors. The code has been compiled with MSVC, clang and g++.

Description

The two functions provided are

  template<class RandomIt>
  void parallelSort(RandomIt begin, RandomIt end, size_t threads = 0) 
  template<class RandomIt, class CF>
  void parallelSort(RandomIt begin, RandomIt end, CF compFunc , size_t threads = 0)

where

begin is an iterator to the start of the data to be sorted.
end is an iterator to the end of the data to be sorted.
threads is an optional parameter to indicate the number of threads to use. The default is set to the hardware_concurrency().
compFunc is an optional pointer to a function that performs the sorting comparison. The default is std::less

In most cases, it is possible to just replace std::sort with parrallelSort to get improves sorting performance.

Example Code

ParallelSortTest.cpp contains the option to to compile the code below to show that it is compatible with std::sort().

// This code was taken from cppreferences website sort example code at https://en.cppreference.com/w/cpp/algorithm/sort
//  The only changes are that std::sort() was replaced with parallelSort.
#include <algorithm>
#include <array>
#include <functional>
#include <iostream>
#include <string_view>
#include "parallelSort.hpp"
int main()
{
  std::array&lt;int, 10> s{ 5, 7, 4, 2, 8, 6, 1, 9, 0, 3 };
  auto print = [&s](std::string_view const rem)
  {
    for (auto a : s)
      std::cout &lt;< a &lt;< ' ';
    std::cout &lt;< ": " &lt;< rem &lt;< '\n';
  };
  parallelSort(s.begin(), s.end());
  print("sorted with the default operator&lt;");
  parallelSort(s.begin(), s.end(), std::greater&lt;int>());
  print("sorted with the standard library compare function object");
  struct
  {
    bool operator()(int a, int b) const { return a &lt; b; }
  }
  customLess;
  parallelSort(s.begin(), s.end(), customLess);
  print("sorted with a custom function object");
  parallelSort(s.begin(), s.end(), [](int a, int b)
    {
      return a > b;
    });
  print("sorted with a lambda expression");
}

Algorithm

My exploration of parallel sorting can be found at https://github.com/johnarobinson77/Explorations-of-Parallel-Merge-Sort. But here is a brief explanation.

parallelSort breaks up the input data into a numer of segments equal to the number of threads. It then sorts each of those segments using std::sort on separate threads, resulting in each of the segments being sorted.

After that, the segments are merged together using a parallel merge algorithm originally developed for GPU sorting by Greenand et al [1]. Each merge uses the total number of specified threads. The segments are iteratively merged into larger and larger segmens until these is only one segment in the original structure.

Performance

The following performance bar graphs show perfprmance with threads from 1 to 20 on a 48 core Graviton3 on AWS. The test size is 16,777,216 elements. Note that the performance increases with each increase in core count. After that the sort is limited by memory bandwidth.

Options

The latest code offers to compile options that control the way threads are allocated in multiprocessing. They are BALANCED_MULTITHREADING mode and MINIMIZED_THREAD_LAUNCH mode. In either case, there are never more threads running than requested, but the number of times those reads are launched changes.

MINIMIZED_THREAD_LAUNCH mode minimizes the number of times threads are launched. This improves performance for smaller sort sizes. The guidance is that if the typical size of the data to be sorted is 1 million or less, it is better to use this mode. The downside is that the work is not as evenly balanced. This is the default mode when no compiler switch is added.

BALANCED_MULTITHREADING has been disabled. It is always slower than MINIMIZED_THREAD_LAUNCH.

References

[1] Greenand, McColl, and Bader. GPU Merge Path: A GPU Merging Algorithm

 Proceedings of the 26th ACM International Conference on Supercomputing

johnarobinson77 / parallelsort Goto Github PK

parallelsort's Introduction

Parallel Sort Code in C++

Description

Example Code

Algorithm

Performance

Options

References

parallelsort's People

Contributors

Watchers

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent