Limit std::sort comparisons on small list

Fri Oct 4 21:51:00 GMT 2013

FranÃ§ois Dumont <frs.dumont@gmail.com> writes:

> On 10/03/2013 10:59 PM, Paolo Carlini wrote:
>> .. or, if I remember correctly, should we maybe see what happens for
>> a smaller or much smaller number of elements to see if there is a
>> measurable (in number of comparisons terms), on sort?
>>
>> Thanks,
>> Paolo.
>>
> I run the sort performance test using only 11 elements and in this
> case I see:
>
> Without patch:
> sort.cc                      reverse 10 comparisons          0r 0u
> 0s        0mem    0pf
> sort.cc                      forwards 20 comparisons         0r 0u
> 0s        0mem    0pf
> sort.cc                      random 30 comparisons           0r 0u
> 0s        0mem    0pf
>
> With patch:
> sort.cc                      reverse 10 comparisons          0r 0u
> 0s        0mem    0pf
> sort.cc                      forwards 19 comparisons         0r 0u
> 0s        0mem    0pf
> sort.cc                      random 29 comparisons           0r 0u
> 0s        0mem    0pf
>
> so with a limited number of elements we can see the gain even if it
> looks like for the sort algo it is limited to 1 comparison. For
> stable_sort it is better.

The theoretic optimum is lg(n!), namely about 26 comparisons.  With any
efficient sorting algorithm, sorting small sublists with something
primitive like insertion sort does _not_ have the purpose to lower the
worst case or average number of comparisons but rather reduce the
administrative overhead for enough comparisons that you gain speed in
_spite_ of slightly _increasing_ the number of comparisons.

So either your numbers here are snake oil caused by random or systematic
flukes, or the implementation of the efficient part of the algorithm is
not overly efficient.

Now if we are talking about an unmodified standard quicksort (not!  the
modified median of three), the best, average, and worst case behavior
numbers for comparisons are 22, 28.5, and 55.  Now your "random" case is
already worse than the average for an unmodified qsort, and the
unmodified qsort is worse than the usually applied median-of-three
qsort.

The really, _really_ suspicious thing however is your best case behavior
of 10 comparison either way.  The only way to get that is to compare
each not-yet viewed element to either the largest or the smallest number
in the sorted set and thus increase the size of the sorted set by 1 with
each comparison.  But that means a comparison scheme/order that is not
commensurable with an efficient sorting scheme since it chooses
comparisons that produce significantly less than 1 bit of information as
the probability of having a number that is at the correct end of the
already known range drops quite below 0.5 after the first few tries.
Even the best case for a mergesort (which has a non-degenerate worst
case of 29 comparisons) is 17 comparisons.  10 is way too low to make
sense for efficient sorts.

What's the mean number of comparisons for insertion sort?  For a list of
mi(2) = 1
mi(n+1) = mi(n) + (1/(n+1)) (sum(i=1, n+1, i) - 1)
        = mi(n) + n/2 + 1 - (1/(n+1))

So mi(11) is about 11 + 55/2 - ln(11) = 35.  Ok, calculating exactly
delivers about 35.5 comparisons.  Pretty good estimate.

But that means that with just 11 numbers, you are on pretty shaky ice
regarding detecting an insertion-sort quality contender among more
efficient variants.

Better use numbers like 10000 or so.  That's still small, but quite less
so.

-- 
David Kastrup