RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v9]

iaroslavski duke at openjdk.org
Sun Oct 22 16:00:32 UTC 2023


On Wed, 20 Sep 2023 16:33:56 GMT, Paul Sandoz <psandoz at openjdk.org> wrote:

> > Hi Paul (@PaulSandoz), Alan (@AlanBateman), Any update? Do you agree with Radix sort in parallel case only?
> 
> I think its definitely a better fit, but another aspect of my previous comment was wondering if we need a radix sort if the vectorized quicksort implementation is fast enough. IMO we need to compare performance results with the vectorized quick sort, and be aware of future enhancements to that.

Hi Paul (@PaulSandoz), Alan (@AlanBateman),

PR with AVX512 improvements has been integrated and I compared current JDK sorting
with my suggested changes. We agreeded to invoke Radix sort in parallel sorting only,
but sequentiual soring (even without Radix sort) shows upto ~ 5-20% speedup
as shown in the performance data below.

Arrays.sort benchmark | Data Type | Array Size | Baseline (us/op) | New version (us/op) | Speedup
-- | -- | -- | -- | -- | --
ArraysSort.intSort |   RANDOM |     600 |      9,016 |      8,426 |  1.07 
ArraysSort.intSort |   RANDOM |    9000 |    526,099 |    499,479 |  1.05 
ArraysSort.intSort |   RANDOM |   20000 |   1283,544 |   1194,464 |  1.07 
ArraysSort.intSort |   RANDOM |  400000 |  32547,730 |  30314,660 |  1.07 
ArraysSort.intSort |   RANDOM | 3000000 | 287803,223 | 274460,275 |  1.05 
ArraysSort.intSort | REPEATED |     600 |      2,041 |      1,877 |  1.09 
ArraysSort.intSort | REPEATED |    9000 |    101,790 |     94,416 |  1.08 
ArraysSort.intSort | REPEATED |   20000 |    262,945 |    210,241 |  1.25 
ArraysSort.intSort | REPEATED |  400000 |   4560,917 |   4242,648 |  1.08 
ArraysSort.intSort | REPEATED | 3000000 |  36925,218 |  33656,517 |  1.10 
ArraysSort.intSort |  STAGGER |     600 |      9,348 |      3,711 |  2.52 
ArraysSort.intSort |  STAGGER |    9000 |     54,833 |     50,308 |  1.09 
ArraysSort.intSort |  STAGGER |   20000 |    124,794 |    110,447 |  1.13 
ArraysSort.intSort |  STAGGER |  400000 |   2623,982 |   2325,099 |  1.13 
ArraysSort.intSort |  STAGGER | 3000000 |  19688,017 |  16812,050 |  1.17 
ArraysSort.intSort |  SHUFFLE |     600 |      5,361 |      4,960 |  1.08 
ArraysSort.intSort |  SHUFFLE |    9000 |    208,574 |    171,812 |  1.21 
ArraysSort.intSort |  SHUFFLE |   20000 |    442,811 |    401,294 |  1.10 
ArraysSort.intSort |  SHUFFLE |  400000 |  10048,756 |   9313,178 |  1.08 
ArraysSort.intSort |  SHUFFLE | 3000000 |  77659,359 |  65154,186 |  1.19 

Parallel soring (with Radix sort) shows upto x2.1-3.6 speedup,
please, see benchamrking data below.

Arrays.sort benchmark | Data Type | Array Size | Baseline (us/op) | New version (us/op) | Speedup
-- | -- | -- | -- | -- | --
ArraysSort.intParallelSort |   RANDOM |     600 |     9,036 |     8,350 | 1.08  
ArraysSort.intParallelSort |   RANDOM |    9000 |   366,551 |   126,962 | 2.89  
ArraysSort.intParallelSort |   RANDOM |   20000 |   497,725 |   193,564 | 2.57  
ArraysSort.intParallelSort |   RANDOM |  400000 |  8096,295 |  2711,692 | 2.99  
ArraysSort.intParallelSort |   RANDOM | 3000000 | 68663,019 | 19060,058 | 3.60  
ArraysSort.intParallelSort | REPEATED |     600 |     2,059 |     1,878 | 1.10  
ArraysSort.intParallelSort | REPEATED |    9000 |    80,267 |    70,176 | 1.14  
ArraysSort.intParallelSort | REPEATED |   20000 |   250,379 |   109,370 | 2.29  
ArraysSort.intParallelSort | REPEATED |  400000 |  4571,467 |  1469,696 | 3.11  
ArraysSort.intParallelSort | REPEATED | 3000000 | 30484,679 | 11636,387 | 2.62  
ArraysSort.intParallelSort |  STAGGER |     600 |     9,676 |     3,700 | 2.62  
ArraysSort.intParallelSort |  STAGGER |    9000 |    54,892 |    52,106 | 1.05  
ArraysSort.intParallelSort |  STAGGER |   20000 |   105,592 |    73,180 | 1.44  
ArraysSort.intParallelSort |  STAGGER |  400000 |  1766,893 |   765,052 | 2.31  
ArraysSort.intParallelSort |  STAGGER | 3000000 | 11912,261 |  6759,822 | 1.76  
ArraysSort.intParallelSort |  SHUFFLE |     600 |     5,417 |     4,974 | 1.09  
ArraysSort.intParallelSort |  SHUFFLE |    9000 |   121,579 |    71,555 | 1.70  
ArraysSort.intParallelSort |  SHUFFLE |   20000 |   236,334 |   111,430 | 2.12  
ArraysSort.intParallelSort |  SHUFFLE |  400000 |  3108,046 |  1075,329 | 2.89  
ArraysSort.intParallelSort |  SHUFFLE | 3000000 | 27130,857 | 11791,105 | 2.30  

New version is good enough: doesn't effect on memory usage for Arrays.osrt()
and shows the boost of performance on parallel sorting.

Paul, Alan,
Are you fine with this PR?

-------------

PR Comment: https://git.openjdk.org/jdk/pull/13568#issuecomment-1774130261


More information about the core-libs-dev mailing list