timsort / cpp-timsort Goto Github PK

View Code? Open in Web Editor NEW

291.0 19.0 45.0 263 KB

A C++ implementation of timsort

License: MIT License

C++ 91.87% CMake 8.13%

sort c-plus-plus sorting algorithm

cpp-timsort's Introduction

TimSort

A C++ implementation of TimSort, an O(n log n) stable sorting algorithm, ported from Python's and OpenJDK's.

See also the following links for a detailed description of TimSort:

This version of the library requires at least C++20. Older versions of the library, available in different branches, offer support for older standards though implement fewer features:

Branch 2.x.y is compatible with C++11, and is slightly more permissive in some reagard due to the lack of concepts to constrain its interface (it notaby supports iterators without postfix operator++/--).
Branch 1.x.y is compatible with C++03. Older versions are not actively maintained anymore. If you need extended support for those, please open specific issues for the problems you want solved.

According to the benchmarks, gfx::timsort is slower than std::ranges::sort on randomized sequences, but faster on partially-sorted ones. It can be used as a drop-in replacement for std::ranges::stable_sort, with the difference that it can't fall back to a O(n log² n) algorithm when there isn't enough extra heap memory available.

Merging sorted ranges efficiently is an important part of the TimSort algorithm. This library exposes gfx::timmerge in the public API, a drop-in replacement for std::ranges::inplace_merge with the difference that it can't fall back to a O(n log n) algorithm when there isn't enough extra heap memory available. According to the benchmarks, gfx::timmerge is slower than std::ranges::inplace_merge on heavily/randomly overlapping subranges of simple elements, but it is faster for complex elements such as std::string, and on sparsely overlapping subranges.

The ibrary exposes the following functions in namespace gfx:

// timsort

template <
    std::random_access_iterator Iterator,
    std::sentinel_for<Iterator> Sentinel,
    typename Compare = std::ranges::less,
    typename Projection = std::identity
>
    requires std::sortable<Iterator, Compare, Projection>
auto timsort(Iterator first, Sentinel last,
             Compare compare={}, Projection projection={})
    -> Iterator;

template <
    std::ranges::random_access_range Range,
    typename Compare = std::ranges::less,
    typename Projection = std::identity
>
    requires std::sortable<std::ranges::iterator_t<Range>, Compare, Projection>
auto timsort(Range &range, Compare compare={}, Projection projection={})
    -> std::ranges::borrowed_iterator_t<Range>;

// timmerge

template <
    std::random_access_iterator Iterator,
    std::sentinel_for<Iterator> Sentinel,
    typename Compare = std::ranges::less,
    typename Projection = std::identity
>
    requires std::sortable<Iterator, Compare, Projection>
auto timmerge(Iterator first, Iterator middle, Sentinel last,
              Compare compare={}, Projection projection={})
    -> Iterator;

template <
    std::ranges::random_access_range Range,
    typename Compare = std::ranges::less,
    typename Projection = std::identity
>
    requires std::sortable<std::ranges::iterator_t<Range>, Compare, Projection>
auto timmerge(Range &&range, std::ranges::iterator_t<Range> middle,
              Compare compare={}, Projection projection={})
    -> std::ranges::borrowed_iterator_t<Range>;

EXAMPLE

Example of using timsort with a defaulted comparison function and a projection function to sort a vector of strings by length:

#include <string>
#include <vector>
#include <gfx/timsort.hpp>

size_t len(const std::string& str) {
    return str.size();
}

// Sort a vector of strings by length
std::vector<std::string> collection = { /* ... */ };
gfx::timsort(collection, {}, &len);

INSTALLATION & COMPATIBILITY

The library is tested with the following compilers:

Ubuntu: GCC 10, Clang 11
Windows: MSVC 19.37.32826.1, MinGW-w64 GCC 12
MacOS: GCC 10, Clang 17

The library can be installed on the system via CMake (at least 3.14) with the following commands:

cmake -S . -B build -DCMAKE_BUILD_TYPE=Release
cmake --install build

Alternatively the library is also available Conan Center and can be directly installed in your local Conan cache with the following command:

conan install --requires=timsort/3.0.0

DIAGNOSTICS & INFORMATION

The following configuration macros allow gfx::timsort and gfx::timmerge to emit diagnostics, which can be helpful to diagnose issues:

Defining GFX_TIMSORT_ENABLE_ASSERT light inserts assertions in key locations in the algorithm to avoid logic errors.
Defining GFX_TIMSORT_ENABLE_AUDIT inserts assertions that verify pre- and postconditions. These verifications can slow the algorithm down significantly. Enable the audits only while testing or debugging. Enabling audits automatically enables lighter assertions too.
Defining GFX_TIMSORT_ENABLE_LOG inserts logs in key locations, which allow to follow more closely the flow of the algorithm.

cpp-TimSort follows semantic versioning and provides the following macros to retrieve the current major, minor and patch versions:

GFX_TIMSORT_VERSION_MAJOR
GFX_TIMSORT_VERSION_MINOR
GFX_TIMSORT_VERSION_PATCH

TESTS

The tests are written with Catch2 and can be compiled with CMake and run through CTest.

When using the project's main CMakeLists.txt, the CMake option BUILD_TESTING is ON by default unless the project is included as a subdirectory. The following CMake options are available to change the way the tests are built with CMake:

GFX_TIMSORT_USE_VALGRIND: if ON, the tests will be run through Valgrind (OFF by default)
GFX_TIMSORT_SANITIZE: this variable takes a comma-separated list of sanitizers options to run the tests (empty by default)

BENCHMARKS

Benchmarks are available in the benchmarks subdirectory, and can be constructed directly by passing the option -DBUILD_BENCHMARKS=ON to CMake during the configuration step.

Example bench_sort output (timing scale: sec.):

c++ -v
Apple LLVM version 7.0.0 (clang-700.0.72)
Target: x86_64-apple-darwin14.5.0
Thread model: posix
c++ -I. -Wall -Wextra -g  -DNDEBUG -O2 -std=c++11 example/bench.cpp -o .bin/bench
./.bin/bench
RANDOMIZED SEQUENCE
[int]
size	100000
std::sort        0.695253
std::stable_sort 0.868916
timsort          1.255825
[std::string]
size	100000
std::sort        3.438217
std::stable_sort 4.122629
timsort          5.791845
REVERSED SEQUENCE
[int]
size	100000
std::sort        0.045461
std::stable_sort 0.575431
timsort          0.019139
[std::string]
size	100000
std::sort        0.586707
std::stable_sort 2.715778
timsort          0.345099
SORTED SEQUENCE
[int]
size	100000
std::sort        0.021876
std::stable_sort 0.087993
timsort          0.008042
[std::string]
size	100000
std::sort        0.402458
std::stable_sort 2.436326
timsort          0.298639

Example bench_merge output (timing scale: milliseconds; omitted detailed results for different middle iterator positions, reformatted to improve readability):

c++ -v
Using built-in specs.
...
Target: x86_64-pc-linux-gnu
...
gcc version 10.2.0 (GCC)
c++ -I ../include -Wall -Wextra -g -DNDEBUG -O2 -std=c++11 bench_merge.cpp -o bench_merge
./bench_merge
size	100000
element type\algorithm:      	std::inplace_merge	timmerge
RANDOMIZED SEQUENCE
[int] approx. average        	 33.404430        	 37.047990
[std::string] approx. average	324.964249        	210.297207
REVERSED SEQUENCE
[int] approx. average        	 11.441404        	  4.017482
[std::string] approx. average	305.649503        	114.773898
SORTED SEQUENCE
[int] approx. average        	  4.291098        	  0.105571
[std::string] approx. average	158.238114        	  0.273858

Detailed bench_merge results for different middle iterator positions can be found at https://github.com/timsort/cpp-TimSort/wiki/Benchmark-results

cpp-timsort's People

Contributors

Stargazers

Watchers

cpp-timsort's Issues

Look into the Java's TimSort problem (JDK-8072909)

OpenJDK’s java.utils.Collection.sort() is broken: The good, the bad and the worst case | Envisage: Engineering Virtualized Services

Formal verification issue

Just a question, has this implementation address the 2015 issue found during formal verification?
https://en.wikipedia.org/wiki/Timsort#Formal_verification

I'm not familiar enough with the algorithm to tell, at this stage.

Valgrind reports issues when trying to sort std::deque

I am still using your timsort implementation in a project, but recently noticed that Valgrind sometimes complained when trying to sort an std::deque with timsort. Here is my reduced test case:

std::deque<double> collection(35);
std::iota(std::begin(collection), std::end(collection), -47.0);
std::mt19937 engine(std::time(nullptr));
std::shuffle(std::begin(collection), std::end(collection), engine);

gfx::timsort(std::begin(collection), std::end(collection));
assert(std::is_sorted(std::begin(collection), std::end(collection)));

The actual test case is slightly more complicated than that since I am using Catch, but nothing really different. Unfortunately, Valgrind's error is way too big for me to include it here, but you can find the full error log here. Now, note that it is a log obtained with the test compiled by g++ in release mode, so it lacks some information about where exactly the error happens.

I will try to find a sequence of numbers that makes Valgrind report errors for sure, and edit this issue when I have one. Currently I don't really have many ideas about what can cause the error. I got similar errors when trying to sort an std::deque with another sorting algorithm: apparently, std::deque is less cool than std::vector when it comes to accessing out-of-bound iterators (not even dereferencing them), so that might be the culprit.

Update: some things in the original testcase such as the shuffling or the fact that I used double were a bit dumb/unneeded. I managed to find a sequence of integers that always triggers the Valgrind errors and cleaned the minimal testcase a bit:

std::deque<int> collection = {
    15, 7, 16, 20, 25, 28, 13, 27, 34, 24, 19, 1,
    6, 30, 32, 29, 10, 9, 3, 31, 21, 26, 8, 2, 22,
    14, 4, 12, 5, 0, 23, 33, 11, 17, 18
};

gfx::timsort(std::begin(collection), std::end(collection));
assert(std::is_sorted(std::begin(collection), std::end(collection)));

Now that we have an initial sequence that always triggers the Valgrind error, here are the corresponding error logs for the program compiler in debug mode with clang++ and g++. Most of them seem to come from mergeHi but I also remember having seen errors coming from mergeAt and gallopRight in older logs.

pending_ can reserve a fixed size of space

instead of dynamic allocating memory, the stack size can be pre-defined by using vector::max_size, which may save a little bit time, correct me if I am wrong.

support vector of pairs

I found your implementation of tim sort via a stackoverflow question.
Would it be possible to support vector of pairs in your code?
Currently I get the following compiler message when trying with a vector if <double, int> pairs.

error: no matching function for call to ‘timsort(std::vector<std::pair<double, int> >::iterator, <unresolved overloaded function type>)’
      gfx::timsort(vector_pair.begin(), vector_pair.end)

with regards
christian

Recruiting new maintainers

Because I am not using C++ in production anymore, I'd like to transfer this project to another organization with new maintainers.

However, there're security incidents these days in which attackers inject malware to modules by new maintainers:

https://blog.npmjs.org/post/180565383195/details-about-the-event-stream-incident

So I'd like to review the org to be transferred carefully, so please tell me the details.

Failure with MSVC debug iterators

While updating the library tooling I added tests for MSVC, whose STL uses checked iterators in debug mode, which don't allow some operations, and notably creating iterators outside of the [begin, end] range. It turns out that this implementation of timsort sometimes accesses iterators outside of this range, which should probably be avoided: while it is most likely safe for contiguous iterators as long as the out-of-bounds iterators aren't dereferenced, it can cause issues with other random-access iterators.

I believe that #14 and Morwenn/cpp-sort#89 - two cases of Valgrind finding issues with std::deque iterators - are linked to this.

I will try to investigate this issue once I'm done with the tooling.

Large number of duplicates

Hi,

I'm sorting data with a large number of duplicates (e.g. 95% duplicated) and wondering if timsort can be (or has been) optimized to remove duplicates during the search ? Thanks.

Duplicate records after sort

I am investigating why after sorting some entries in my array get overwritten with duplicates. Basically, if I start out with an array of size 500k and no duplicates (well there maybe duplicates from a key perspective), it appears that after running I end up with the same array size, but some entries have been duplicated and others are gone.

I caught the problem with a sanity check where I ran through the array checking that each element was greater than or equal to its predecessor. I have tried running in debug mode and did not see any assert failures.

My use case is a little complicated, but not terribly. I have an array of plain old data objects and an array of ints which are to be a sorted index of the data in the data array. The int array is filled with the index of each data object in the main array. Using a custom compare I would like to sort the int array.

It may still turn out to be a bug in my code, but the fact that it works with std::sort, std::stable_sort, and a version of SmoothSort made me think it worth asking if you think this may be a bug in the timsort impl.

Below is some psuedo code.

// this seems to work
std::sort( index_array.begin(), index_array.end(), compareHelper(data_array));  

// this seems to consistently run into problems
gfx::timsort( index_array.begin(), index_array.end(), compareHelper(data_array)); 

-------------------------------------
class data {
    int  col1;
    int  col 2; 
    int  col 3;
}

std::vector< data >  data_array;
std::vector< uint32_t >      index_array;

insert_data ( data d )
{
       data_array.push_back(d);
       index_array.push_back( data_array.size()-1);
}

struct  compareHelper
{
   const std::vector< data >& mArray;
   compareHelper( const data_array& aArray )
      :  mArray( aArray) {};

   bool operator()(const uint32_t x, const uint32_t y) const
   {
      const data& a = mArray[x];
      const data& b = mArray[y];

      if (  a.col1 == b.col1 )
      {
         return a.col2 < b.col2;
      }
      return a.col1 < b.col1;
   };
}

vector of pairs with reference

Hi, I seem to be having an issue with pairs. I start with a vector with 4 elements
(1,a) (3,c) (4,d) (2,b)
But then when I try to sort them I get 'd' being repeated and 'b' is gone
(1,a) (2,d) (3,c) (4,d)
This doesn't happen when the second value in the pair is a string (I want a string&). Here is the code I use

int main() {
  typedef std::pair<int, std::string&> KV;
  std::vector<KV> keys;
  keys.reserve(8);
  std::vector<std::string> vals;
  vals.reserve(8);
  vals.push_back("a");
  keys.push_back(static_cast<KV>(KV(1, vals.back())));
  vals.push_back("c");
  keys.push_back(static_cast<KV>(KV(3, vals.back())));
  vals.push_back("d");
  keys.push_back(static_cast<KV>(KV(4, vals.back())));
  vals.push_back("b");
  keys.push_back(static_cast<KV>(KV(2, vals.back())));
  std::cout << std::endl;
  for (auto s : keys) {
    std::cout << "(" << s.first << "," << s.second << ") ";
  }
  std::cout << std::endl;
  timsort(keys.begin(),
                 keys.end(),
                 [](const KV& a, const KV& b) {
                   return a.first < b.first;
                 });
  for (auto s : keys) {
    std::cout << "(" << s.first << "," << s.second << ") ";
  }
  std::cout << std::endl;
  return 0;
}

Adding TimSort to Boost.Sort

Hello.

I want to add your TimSort's implementation to Boost.Sort library. Can you change license to Boost? Or can i copy your source, rework it in some way (add new constructors, etc. ), change license from MIT to Boost, add your name to authors and send it to Boost.Sort?

Thanks.

Proposal: allow use of "tristate" compare functor

Background:

Like any well-behaved C++ template algorithm, gfx::timsort takes a comparison functor in the form of std::less(a, b) -- i.e. take two objects and return a bool indicating if a<b

For most of the algorithm this works well, the exception being gallopLeft() with its calls to gt()/le() These have to be implemented by two calls to the comparison functor. For a simple value type like int the compiler will easily optimize this into a single branch. However for something more complicated (and more expensive to compare!) like std::string this can end up doing two comparisons of the same elements.

For algorithms like this a qsort()-like comparison functor (i.e. strcmp()/memcmp()/std::string::compare()/etc) is actually more efficient since it gives you more than a simple bool result. Since the comparison functor gives you more information you can call it fewer times.

Proposal:

It's good that the standard interface of gfx::timsort matches std::stable_sort and takes a std::less-based function. However, since your code already abstracts these comparisons via the template <typename Value, typename LessFunction> class Compare helper class, why not provide an alternate function that takes a qsort-like functor instead? Just as a proof of concept I tried this:

diff --git a/timsort.hpp b/timsort.hpp
index 77dd7a2..e6230c0 100644
--- a/timsort.hpp
+++ b/timsort.hpp
@@ -126,12 +126,46 @@ template <typename Value, typename LessFunction> class Compare {
     func_type less_;
 };

-template <typename RandomAccessIterator, typename LessFunction> class TimSort {
+template <typename Value, typename TristateFunction> class CompareFromTristate {
+  public:
+    typedef Value value_type;
+    typedef TristateFunction func_type;
+
+    CompareFromTristate(TristateFunction f) : tristate_(f) {
+    }
+    CompareFromTristate(const CompareFromTristate<value_type, func_type> &other) : tristate_(other.tristate_) {
+    }
+
+    bool lt(value_type x, value_type y) {
+        return tristate_(x, y) < 0;
+    }
+    bool le(value_type x, value_type y) {
+        return tristate_(x, y) <= 0;
+    }
+    bool gt(value_type x, value_type y) {
+        return tristate_(x, y) > 0;
+    }
+    bool ge(value_type x, value_type y) {
+        return tristate_(x, y) >= 0;
+    }
+
+    // Allow this to be passed to APIs like std::lower_bound
+    CompareFromTristate &less_function() {
+        return *this;
+    }
+    bool operator()(value_type x, value_type y) {
+        return lt(x, y);
+    }
+
+  private:
+    func_type tristate_;
+};
+
+template <typename RandomAccessIterator, typename compare_t> class TimSort {
     typedef RandomAccessIterator iter_t;
     typedef typename std::iterator_traits<iter_t>::value_type value_t;
     typedef typename std::iterator_traits<iter_t>::reference ref_t;
     typedef typename std::iterator_traits<iter_t>::difference_type diff_t;
-    typedef Compare<const value_t &, LessFunction> compare_t;

     static const int MIN_MERGE = 32;

@@ -245,7 +279,7 @@ template <typename RandomAccessIterator, typename LessFunction> class TimSort {
         return n + r;
     }

-    TimSort(compare_t c) : comp_(c), minGallop_(MIN_GALLOP) {
+    TimSort(const compare_t& c) : comp_(c), minGallop_(MIN_GALLOP) {
     }

     void pushRun(iter_t const runBase, diff_t const runLen) {
@@ -661,6 +695,7 @@ template <typename RandomAccessIterator, typename LessFunction> class TimSort {

     // the only interface is the friend timsort() function
     template <typename IterT, typename LessT> friend void timsort(IterT first, IterT last, LessT c);
+    template <typename IterT, typename TristateT> friend void timsort_tristate(IterT first, IterT last, TristateT c);
 };

 template <typename RandomAccessIterator>
@@ -671,7 +706,18 @@ inline void timsort(RandomAccessIterator const first, RandomAccessIterator const

 template <typename RandomAccessIterator, typename LessFunction>
 inline void timsort(RandomAccessIterator const first, RandomAccessIterator const last, LessFunction compare) {
-    TimSort<RandomAccessIterator, LessFunction>::sort(first, last, compare);
+    typedef typename std::iterator_traits<RandomAccessIterator>::value_type value_type;
+    typedef typename gfx::Compare<const value_type &, LessFunction> compare_type;
+    compare_type const compare_from_less(compare);
+    TimSort<RandomAccessIterator, compare_type>::sort(first, last, compare_from_less);
+}
+
+template <typename RandomAccessIterator, typename TristateFunction>
+inline void timsort_tristate(RandomAccessIterator const first, RandomAccessIterator const last, TristateFunction compare_tristate) {
+    typedef typename std::iterator_traits<RandomAccessIterator>::value_type value_type;
+    typedef typename gfx::CompareFromTristate<const value_type &, TristateFunction> compare_type;
+    compare_type const compare_from_tristate(compare_tristate);
+    TimSort<RandomAccessIterator, compare_type>::sort(first, last, compare_from_tristate);
 }

 } // namespace gfx

On a quick test of 10^7 integers from rand() (so lots of duplicates but no sorted runs in the input) this caused about a 1.1% drop in the number of comparisons needed... so for an "expensive to compare, cheap to move" data type you might get a ~1% improvement in overall sort speed (although I didn't do extensive benchmarking of this). Not a huge improvement, but it's something.

The merge_cxx_11 tests are not run

I just realized that but it seems that the tests from merge_cxx_11_tests.cpp are not run locally nor in the continuous integration service. It's probably time to find where the issue started and how to fix it.

namespace gfx

I just realized that the namespace gfx has two million hits on GitHub alone: it is most likely a common abbreviation for "graphics" and it's used in big engines (it notably appears in the Fuchsia source code).

I'm not sure what to do about this. We could probably play it safe and use namespace timsort instead, which apparently doesn't exist. That would give a timsort::timsort(...) function, a timsort/timsort.hpp header to include and a timsort::timsort CMake target. It doesn't feel like the coolest solution, but it's probably pragmatic enough.

Incorrect minRunLength() computing minRun to be too small

The following line seems incorrect.

Since MIN_MERGE is 32, the loop would end with n containing the 5 most significant bits instead of 6 most significant bits.
Therefore, the function would return [0, 32], which is less than what Python is doing here

The Python detailed description of Timsort in the readme here also corresponds to returning a size of [32, 64] when possible.