Re[2]: Benchmarking of new optimized Dual-Pivot Quicksort

Vladimir Yaroslavskiy vlv.spb.ru at mail.ru
Fri Jul 26 10:03:40 UTC 2019


Hi Laurent,

Many thanks for feedback!

I run tests on two computers under Windows (8 processors) and Linux (88 processors).

The details of the first computer is: Window 10 (64-bit), Intel Core i7 8650U 1.90 GHz
and output of "uname -a" command for the second computer is:
Linux rho 4.15.0-54-generic #58-Ubuntu SMP Mon Jun 24 10:55:24 UTC 2019 x86_64 x86_64 x86_64 GNU/Linux

The JDK on Windows is OpenJDK 64-Bit Server VM, build 14-ea+6-171,
on Linux - jdk13 latest version.

The setting for JVM running is (one option -Xbatch only):
java -Xbatch -classpath %CLASSES% Main

Also I uploaded the results from Linux computer for sequential sorting of int / long arrays.
details are there  https://github.com/iaroslavski/sorting/tree/master/benchmarking/results
The files <type>-s-<size> and <type>-p-<size> are related to the first (Windows) computer,
the files <type>-s88-<size> and <type>-p88-<size> are related to the second (Linux) computer.

The summary is here:

[int] sequential sorting, gain:
Windows: size 1M: 0.31; size 30K: 0.28; size: 150: 0.15
      Linux: size 1M: 0.31; size 30K: 0.27; size: 150: 0.07

[long] sequential sorting, gain:
Windows: size 1M: 0.32; size 30K: 0.23; size 150: 0.12
      Linux: size 1M: 0.30; size 30K: 0.23; size 150: 0.05 You can see that sorting shows the same results on large and medium sizes for both types,
and it works slower on small size on Linux, but shows positive gains.

Thank you,
Vladimir

>Пятница, 26 июля 2019, 10:43 +03:00 от Laurent Bourgès <bourges.laurent at gmail.com>:
>
>Hi Vladimir,
>
>First Thank you for these impressive results: 15% to 70% overall gains is amazing and will make a big difference, Congratulations !
>
>Could you publish the different test environment ?
>- HW: cpu (lscpu output)
>- OS version
>- JDK version + JVM settings 
>
>Ideally you or I could write a shell script to collect setup and write a simple log file.
>
>PS: I suppose you tested DPQS only on intel cpus, could somebody test on ppc or arm cpus as well ?
>
>Cheers,
>Laurent
>Le jeu. 25 juil. 2019 à 16:04, Vladimir Yaroslavskiy < vlv.spb.ru at mail.ru > a écrit :
>>Hi all,
>>With the help of Laurent Bourges I run benchmarking of new optimized Dual-Pivot Quicksort
>>and summarized results. The tests were run for all types (int / long / ... / double), for both types:
>>sequential and parallel sorting, for small, medium and large sizes. The comparison was done
>>on several data types, such as: random, period, sorted, equal, stagger, shuffle.
>>Here is the summary of total gains. The gain is defined as (jdk_time - dpqs_time) / jdk_time,
>>where jdk_time is sorting time by the current jdk14 and dpqs_time is sorting time by new
>>optimized Dual-Pivot Quicksort. To get stable and reproducible results, min time of many
>>invocations is taken.
>>You can find all detailed results there
>>https://github.com/iaroslavski/sorting/tree/master/benchmarking/results
>>
>>Sources of benchmarking tests are there
>>https://github.com/iaroslavski/sorting/tree/master/benchmarking/sources
>>
>>You can see not good performance for float / double types (parallel sorting).
>>This behavior can be explained by existing bug in jdk, when NaNs and -0.0s
>>are not processed properly. New sorting has total losses for small float / double
>>arrays, few percents only. For all other cases new optimized sorting is winner.
>>
>>--------------------------------------------------------------------------------------------------------
>>[int]
>>sequential, Length = 150, Gain: 0.15
>>sequential, Length = 30000, Gain: 0.28
>>sequential, Length = 1000000, Gain: 0.31
>>parallel 8, Length = 150, Gain: 0.14
>>parallel 8, Length = 30000, Gain: 0.15
>>parallel 8, Length = 1000000, Gain: 0.14
>>[long]
>>sequential, Length = 150, Gain: 0.12
>>sequential, Length = 30000, Gain: 0.23
>>sequential, Length = 1000000, Gain: 0.32
>>parallel 8, Length = 150, Gain: 0.11
>>parallel 8, Length = 30000, Gain: 0.16
>>parallel 8, Length = 1000000, Gain: 0.24
>>parallel 88 processors, Length = 1000000, Gain: 0.05
>>[byte]
>>sequential, Length = 150, Gain: 0.06
>>sequential, Length = 30000, Gain: 0.36
>>sequential, Length = 1000000, Gain: 0.37
>>parallel 8, Length = 150, Gain: 0.13
>>parallel 8, Length = 30000, Gain: 0.73
>>parallel 8, Length = 1000000, Gain: 0.65
>>[char]
>>sequential, Length = 150, Gain: 0.10
>>sequential, Length = 30000, Gain: 0.00
>>sequential, Length = 1000000, Gain: 0.17
>>parallel 8, Length = 150, Gain: 0.10
>>parallel 8, Length = 30000, Gain: 0.33
>>parallel 8, Length = 1000000, Gain: 0.62
>>[short]
>>sequential, Length = 150, Gain: 0.14
>>sequential, Length = 30000, Gain: 0.10
>>sequential, Length = 1000000, Gain: 0.18
>>parallel 8, Length = 150, Gain: 0.13
>>parallel 8, Length = 30000, Gain: 0.41
>>parallel 8, Length = 1000000, Gain: 0.65
>>[float]
>>sequential, Length = 150, Gain: -0.02
>>sequential, Length = 30000, Gain: 0.21
>>sequential, Length = 1000000, Gain: 0.24
>>parallel 8, Length = 150, Gain: -0.05
>>parallel 8, Length = 30000, Gain: -0.04
>>parallel 8, Length = 1000000, Gain: -0.12
>>[double]
>>sequential, Length = 150, Gain: -0.03
>>sequential, Length = 30000, Gain: 0.19
>>sequential, Length = 1000000, Gain: 0.23
>>parallel 8, Length = 150, Gain: -0.05
>>parallel 8, Length = 30000, Gain: 0.05
>>parallel 8, Length = 1000000, Gain: 0.15
>>Thank you,
>>Vladimir


More information about the core-libs-dev mailing list