Why is builtin sorted() slower for a list containing descending numbers if each number appears twice consecutively?

Question

As alluded to in the comments by btilly and Amadan, this is due to how the Timsort sorting algorithm works. Detailed description of the algorithm is here.

Timsort speeds up operation on partially sorted arrays by identifying runs of sorted elements.

A run is either
“ascending”, which means non-decreasing:

a0 <= a1 <= a2 <= …

or “descending”, which means strictly decreasing:

a0 > a1 > a2 > …

Note that a run is always at least 2 long, unless we start at the array’s
last element.

Your arrays a, b and c each consist of just one run.
The array d has 1 million runs.

The reason why the descending run cannot be >= is to make the sort stable, i.e. keep the order of equal elements:

The definition of descending is strict, because the main routine reverses
a descending run in-place, transforming a descending run into an ascending
run. Reversal is done via the obvious fast “swap elements starting at each
end, and converge at the middle” method, and that can violate stability if
the slice contains any equal elements. Using a strict definition of
descending ensures that a descending run contains distinct elements.

Python 3.11 has slightly improved version of timsort, sometimes called powersort, but it uses the same run detection and thus has the same performance.

Leave a Comment Cancel reply