RFR: 8325679: Optimize ArrayList subList sort
Attila Szegedi
attila at openjdk.org
Tue Feb 13 12:37:53 UTC 2024
On Tue, 13 Feb 2024 03:55:37 GMT, Stuart Marks <smarks at openjdk.org> wrote:
>> Somewhat surprisingly, `ArrayList$Sublist.sort()` is not specialized and will thus fall back to slower default method of `List.sort()` instead of sorting a range of the array in-place in its backing root `ArrayList`.
>>
>> This doesn't change observable behavior, so haven't added tests, and `tier1` tests still all pass except for `test/jdk/java/util/Locale/LocaleProvidersFormat.java` which also currently fails on master too on the machine I tested on.
>
> @szegedi Oh interesting catch. Looks like a reasonable optimization. Is this covered by a test, or should a new test be added?
>
> @JimLaskey The `List.sort` method was added in JDK 8, so not really day one. Prior to that one had to call `Collections.sort` which copied to a temp array, sorted, then copied back; this applied equally to full lists and sublists. Since JDK 8, `ArrayList.sort` on the full list did sorting in place but sorting a subList just inherited the default method (which uses the old copy-to-temp-and-sort technique). My guess is that nobody noticed this because sublists aren't used that much, and sorting of subarrays, while arguably necessary for completeness, probably aren't used all that much either.
@stuart-marks you're right that sorting of sublists is rare. I did run into a need for it a few months ago in my day job though, that's how I stumbled upon this. Granted, it was a performance optimization and not a correctness necessity. See, I have a large list of data I read from an external location. Some elements – in at most one identifiable but arbitrarily large stride in the large list – need some extra processing. The algorithm for this processing can be written to be more efficient if it can presume those elements are sorted.
I could still sort the entire list (as the ordering doesn't matter to the final receiver), using a sort that just compares irrelevant elements as equal, so they'd remain as large "already sorted" strides, and only apply the real sort on only the relevant elements:
static int fullListCompare(Data d1, Data d2) {
if (isRelevant(d1)) {
if (isRelevant(d2)) {
return sublistCompare(d1, d2);
} else {
return -1;
}
} else if (isRelevant(d2)) {
return 1;
}
return 0;
}
This'd also have the side effect of moving all the relevant entries to the beginning of the list, but while that's not a problem, it's not necessary either. For this reason, I prefer to scan the list for the stride, reify it as a sublist, and only run a sort with `sublistCompare` on it, and then also do the extra processing of the elements on the sublist too.
So yeah, it is rare this pops up, but it does happen :-)
I'm not sure if there's a test that already exercises sublist sorts. I did find what seems like a setup for a [TCK test for ArrayList](https://github.com/openjdk/jdk/blob/master/test/jdk/java/util/concurrent/tck/ArrayListTest.java) that seems like it is specifically creating a test suite for sublists too, but I haven't looked more closely into how is that supposed to work. I'm happy to write some tests specifically for ArrayList sublist sort if this TCK one doesn't exercise it.
-------------
PR Comment: https://git.openjdk.org/jdk/pull/17818#issuecomment-1941426905
More information about the core-libs-dev
mailing list