RFR: JDK-8266431: Dual-Pivot Quicksort improvements (Radix sort) [v15]

Wed Aug 10 09:17:43 UTC 2022

On Tue, 9 Aug 2022 06:49:53 GMT, Peter Levart <plevart at openjdk.org> wrote:

>> Here are JMH test results on my stable perf laptop (cpu fixed: 4 real cpu cores at 2ghz, HT disabled, i7 6820k):
>> https://github.com/bourgesl/bourgesl.github.io/blob/master/bench-220709-summary-2.log
>> 
>> It confirms Vladimir results: 50% gain in average in parallelSort(), huge gains (x5) on large random arrays...
>> 
>> As DPQS relies on ForkJoinPool.getCommonParallelism = 3 on my machine, MT speedup is only x3 ! 
>> I observed max 300%/400% cpu load when parallelSort() runs on large arrays...
>> Why does it return NCpuCore - 1 ?
>
> Hi @bourgesl, 
> In order to convince reviewers of the benefits (and lack of drawbacks) of your effort, you could present the results of benchmarks in a friendlier way. Looking at the published JMH code and published results, I can only suspect the results labeled with "new*" are actually obtained by running the "test*" JMH benchmarks with the "patched" JDK and the published "test*" results are obtained by running with "unpatched" JDK? If you want to present the results in a friendlier way, graph them. It's easy. Just publish the unmodified pre and post JMH outputs as two public GISTs and then point to them with this tool: https://jmh.morethan.io/ ...
> CPU time / real time is one aspect of performance. Allocation is another. JMH has an option to measure both. Show that your claim about less aggressive memory allocation is true. It would be interesting also to show the effect of a fall-back mechanism when the allocation of buffers is not successful (again in a pre/post fashion).
> I can see there is a lot of effort put into this change. It would be a shame that it doesn't gain enough interest just because it requires too much effort to study it.

Hi @plevart 
Thanks for your advices, we will run again jmh tests within 1 week using json output, required by jmh analyzer on the latest dpqs patch.

I agree the patched/unpatched labels are better than new/test labels.

I will write a basic test to sort 1million integers that fails before/passes now using low max heap-mem ~ 64m...

Laurent

-------------

PR: https://git.openjdk.org/jdk/pull/3938