Giter VIP home page Giter VIP logo

parallelsort's Introduction

Parallel Sort Code in C++

The code provided in this repository implements a parallel sort algorithm that follows the same conventions as std::sort(). It works well on large data sets, and performance improves with any number of threads up to the point where memory bandwidth is the limit. It works with arrays and std::vectors. The code has been compiled with MSVC, clang and g++.

Description

The two functions provided are

  template<class RandomIt>
  void parallelSort(RandomIt begin, RandomIt end, size_t threads = 0) 
  template<class RandomIt, class CF>
  void parallelSort(RandomIt begin, RandomIt end, CF compFunc , size_t threads = 0)
  

where

  • begin is an iterator to the start of the data to be sorted.
  • end is an iterator to the end of the data to be sorted.
  • threads is an optional parameter to indicate the number of threads to use. The default is set to the hardware_concurrency().
  • compFunc is an optional pointer to a function that performs the sorting comparison. The default is std::less

In most cases, it is possible to just replace std::sort with parrallelSort to get improves sorting performance.

Example Code

ParallelSortTest.cpp contains the option to to compile the code below to show that it is compatible with std::sort().

// This code was taken from cppreferences website sort example code at https://en.cppreference.com/w/cpp/algorithm/sort
//  The only changes are that std::sort() was replaced with parallelSort.
#include <algorithm>
#include <array>
#include <functional>
#include <iostream>
#include <string_view>
#include "parallelSort.hpp"
int main()
{
  std::array&lt;int, 10> s{ 5, 7, 4, 2, 8, 6, 1, 9, 0, 3 };
  auto print = [&s](std::string_view const rem)
  {
    for (auto a : s)
      std::cout &lt;< a &lt;< ' ';
    std::cout &lt;< ": " &lt;< rem &lt;< '\n';
  };
  parallelSort(s.begin(), s.end());
  print("sorted with the default operator&lt;");
  parallelSort(s.begin(), s.end(), std::greater&lt;int>());
  print("sorted with the standard library compare function object");
  struct
  {
    bool operator()(int a, int b) const { return a &lt; b; }
  }
  customLess;
  parallelSort(s.begin(), s.end(), customLess);
  print("sorted with a custom function object");
  parallelSort(s.begin(), s.end(), [](int a, int b)
    {
      return a > b;
    });
  print("sorted with a lambda expression");
}

Algorithm

My exploration of parallel sorting can be found at https://github.com/johnarobinson77/Explorations-of-Parallel-Merge-Sort. But here is a brief explanation.

parallelSort breaks up the input data into a numer of segments equal to the number of threads. It then sorts each of those segments using std::sort on separate threads, resulting in each of the segments being sorted.

After that, the segments are merged together using a parallel merge algorithm originally developed for GPU sorting by Greenand et al [1]. Each merge uses the total number of specified threads. The segments are iteratively merged into larger and larger segmens until these is only one segment in the original structure.

Performance

The following performance bar graphs show perfprmance with threads from 1 to 20 on a 48 core Graviton3 on AWS. The test size is 16,777,216 elements. Note that the performance increases with each increase in core count. After that the sort is limited by memory bandwidth.

Integer Sort Performance String Sort Performance

Options

The latest code offers to compile options that control the way threads are allocated in multiprocessing. They are BALANCED_MULTITHREADING mode and MINIMIZED_THREAD_LAUNCH mode. In either case, there are never more threads running than requested, but the number of times those reads are launched changes.

MINIMIZED_THREAD_LAUNCH mode minimizes the number of times threads are launched. This improves performance for smaller sort sizes. The guidance is that if the typical size of the data to be sorted is 1 million or less, it is better to use this mode. The downside is that the work is not as evenly balanced. This is the default mode when no compiler switch is added.

BALANCED_MULTITHREADING has been disabled. It is always slower than MINIMIZED_THREAD_LAUNCH.

References

[1] Greenand, McColl, and Bader. GPU Merge Path: A GPU Merging Algorithm

 Proceedings of the 26th ACM International Conference on Supercomputing

parallelsort's People

Contributors

johnarobinson77 avatar

Watchers

 avatar

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    ๐Ÿ–– Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. ๐Ÿ“Š๐Ÿ“ˆ๐ŸŽ‰

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google โค๏ธ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.