sorting and stability

Wed Mar 27 09:05:15 PDT 2013

On 03/25/13 19:30, Doug Lea wrote:
> It struck me that if the stream is ORDERED, then people
> have a reasonable expectation that the sort will be stable.
> Else not.
>
> Agreed?

Apparently everyone does.

>
> Algorithmically, the cost of this might on average be zero.
> Even though it does add cost in general, it can be made
> immeasurably small when there are no duplicate keys, and
> as fast as any other strategy when there are duplicates.

This still holds for versions ready enough to tentatively
commit. Paul will help integrate into lambda repo within
a few days.

A few other notes on implementation:

* While I was at it, I tested split thresholds more
carefully. Unsurprisingly (in retrospect), you need
values big enough to outweigh memory contention across
tasks, which (because of cardmarks) is bigger than you'd
like -- it is now set to 8K.

* The previous versions required temp workspace
arrays as large as source array, even if only a
portion was being sorted. Now they don't.

* The new versions use CountedCompleters, which
makes them stall much less on GC etc.

* The new versions bypass par sort entirely
if size < threshold, which bypasses needless workspace
array allocation.

* I ran these all through a modified version of
the jtreg "Sorting" test that replaces "sort"
with "parallelSort" everywhere (plus uses larger
array sizes to test), that includes stability checks.
Someone might want to add something like this to jtreg.

-Doug