RFR: 8317763: Follow-up to AVX512 intrinsics for Arrays.sort() PR [v5]
Danny Thomas
duke at openjdk.org
Thu Oct 12 03:47:18 UTC 2023
On Wed, 11 Oct 2023 20:58:23 GMT, Srinivas Vamsi Parasa <duke at openjdk.org> wrote:
>> The goal of this PR is to address the follow-up comments to the SIMD accelerated sort PR (#14227) which implemented AVX512 intrinsics for Arrays.sort() methods.
>> The proposed changes are:
>>
>> 1) Restriction of the AVX512 sort acceleration to only Intel CPUs. A performance regression (due to micro-architectural differences) was reported for AMD Zen4 CPUs in the comments section of PR.
>> 2) Addressing the build failure due to a bug in GCC 12 (which was fixed in version 12.3.1). The details of the bug are at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=105593
>> 3) Minor changes in Javadoc strings
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
>
> Revert @ForceInline annotations for small array sort methods
At least on Saphire Rapids the [emulation suggested here](https://github.com/natmaurice/x86-simd-sort/commit/41d03b2d8f3b62a2ee6a3a97a8da7f193a407026) only imposes a 6% penalty for `intSort`, while also mitigating the performance issue on Zen 4.
Configuration summary:
* Name: linux-x86_64-server-release
* Debug level: release
* HS debug level: product
* JVM variants: server
* JVM features: server: 'cds compiler1 compiler2 epsilongc g1gc jfr jni-check jvmci jvmti management parallelgc serialgc services shenandoahgc vm-structs zgc'
* OpenJDK target: OS: linux, CPU architecture: x86, address length: 64
* Version string: 22-internal-adhoc.nfsuper.jdk (22-internal)
* Source date: 1697078366 (2023-10-12T02:39:26Z)
Tools summary:
* Boot JDK: openjdk version "21" 2023-09-19 OpenJDK Runtime Environment Zulu21.28+86-SA (build 21+35) OpenJDK 64-Bit Server VM Zulu21.28+86-SA (build 21+35, mixed mode, sharing) (at /usr/lib/jvm/zulu-21-amd64)
* Toolchain: gcc (GNU Compiler Collection)
* C Compiler: Version 11.4.0 (at /usr/bin/gcc)
* C++ Compiler: Version 11.4.0 (at /usr/bin/g++)
https://github.com/openjdk/jdk/compare/master...DanielThomas:jdk:dannyt/emulate-compressstoreu?expand=1
## Intel(R) Xeon(R) Platinum 8488C - Current
Benchmark (size) Mode Cnt Score Error Units
ArraysSort.intSort 10 avgt 3 0.043 ? 0.006 us/op
ArraysSort.intSort 25 avgt 3 0.082 ? 0.002 us/op
ArraysSort.intSort 50 avgt 3 0.205 ? 0.022 us/op
ArraysSort.intSort 75 avgt 3 0.394 ? 0.048 us/op
ArraysSort.intSort 100 avgt 3 0.625 ? 0.003 us/op
ArraysSort.intSort 1000 avgt 3 5.759 ? 1.111 us/op
ArraysSort.intSort 10000 avgt 3 51.680 ? 3.568 us/op
ArraysSort.intSort 100000 avgt 3 777.339 ? 25.809 us/op
ArraysSort.intSort 1000000 avgt 3 8848.261 ? 954.475 us/op
## Intel(R) Xeon(R) Platinum 8488C - Emulated
Benchmark (size) Mode Cnt Score Error Units
ArraysSort.intSort 10 avgt 3 0.046 ? 0.002 us/op
ArraysSort.intSort 25 avgt 3 0.083 ? 0.004 us/op
ArraysSort.intSort 50 avgt 3 0.214 ? 0.022 us/op
ArraysSort.intSort 75 avgt 3 0.411 ? 0.038 us/op
ArraysSort.intSort 100 avgt 3 0.658 ? 0.022 us/op
ArraysSort.intSort 1000 avgt 3 6.411 ? 0.497 us/op
ArraysSort.intSort 10000 avgt 3 55.996 ? 3.155 us/op
ArraysSort.intSort 100000 avgt 3 822.805 ? 40.223 us/op
ArraysSort.intSort 1000000 avgt 3 9487.974 ? 216.146 us/op
## Intel(R) Xeon(R) Platinum 8488C - Baseline
Benchmark (size) Mode Cnt Score Error Units
ArraysSort.intSort 10 avgt 3 0.047 ? 0.006 us/op
ArraysSort.intSort 25 avgt 3 0.099 ? 0.022 us/op
ArraysSort.intSort 50 avgt 3 0.249 ? 0.024 us/op
ArraysSort.intSort 75 avgt 3 0.438 ? 0.046 us/op
ArraysSort.intSort 100 avgt 3 0.590 ? 0.079 us/op
ArraysSort.intSort 1000 avgt 3 8.384 ? 1.852 us/op
ArraysSort.intSort 10000 avgt 3 435.589 ? 23.647 us/op
ArraysSort.intSort 100000 avgt 3 5380.658 ? 491.435 us/op
ArraysSort.intSort 1000000 avgt 3 63857.189 ? 2746.106 us/op
## AMD EPYC 9R14 - Emulated
$ make test TEST="micro:java.lang.ArraysSort.intSort"
Benchmark (size) Mode Cnt Score Error Units
ArraysSort.intSort 10 avgt 3 0.032 ? 0.001 us/op
ArraysSort.intSort 25 avgt 3 0.067 ? 0.002 us/op
ArraysSort.intSort 50 avgt 3 0.196 ? 0.002 us/op
ArraysSort.intSort 75 avgt 3 0.429 ? 0.046 us/op
ArraysSort.intSort 100 avgt 3 0.614 ? 0.025 us/op
ArraysSort.intSort 1000 avgt 3 6.500 ? 0.084 us/op
ArraysSort.intSort 10000 avgt 3 55.620 ? 0.943 us/op
ArraysSort.intSort 100000 avgt 3 669.347 ? 75.432 us/op
ArraysSort.intSort 1000000 avgt 3 9459.001 ? 201.298 us/op
Finished running test 'micro:java.lang.ArraysSort.intSort'
## AMD EPYC 9R14 - Baseline
$ make test TEST="micro:java.lang.ArraysSort.intSort" MICRO="VM_OPTIONS=-XX:UseAVX=2"
Benchmark (size) Mode Cnt Score Error Units
ArraysSort.intSort 10 avgt 3 0.035 ? 0.016 us/op
ArraysSort.intSort 25 avgt 3 0.091 ? 0.009 us/op
ArraysSort.intSort 50 avgt 3 0.245 ? 0.002 us/op
ArraysSort.intSort 75 avgt 3 0.412 ? 0.004 us/op
ArraysSort.intSort 100 avgt 3 0.531 ? 0.003 us/op
ArraysSort.intSort 1000 avgt 3 8.803 ? 0.609 us/op
ArraysSort.intSort 10000 avgt 3 254.413 ? 153.004 us/op
ArraysSort.intSort 100000 avgt 3 4485.811 ? 17.517 us/op
ArraysSort.intSort 1000000 avgt 3 56552.132 ? 3124.280 us/op
-------------
PR Comment: https://git.openjdk.org/jdk/pull/16124#issuecomment-1758865865
More information about the hotspot-compiler-dev
mailing list