Array.sort should use AVX-512 SIMD sort on Zen 5 and later
Piotr Tarsa
piotr.tarsa at gmail.com
Thu Nov 28 17:19:59 UTC 2024
Hi,
I'm writing here because the hotspot mailing lists are full of
GitHub's discussions and nothing else.
Summary: instead of enabling AVX-512 SIMD sort on Intel CPUs only, the
quick fix should be to disable AVX-512 SIMD sort on Zen 4 only (so
keep it enabled on Zen 5 and future Zen CPUs).
Explanation:
In https://bugs.openjdk.org/browse/JDK-8317763 Follow-up to AVX512
intrinsics for Arrays.sort() PR, one of main changes is:
> 1) Restriction of the AVX512 sort acceleration to only Intel CPUs. A performance regression (due to micro-architectural differences) was reported for AMD Zen4 CPUs in the comments section of PR #14227.
That's too drastic. Instead, Zen 4 should be detected, i.e. instead of:
if (hasAvx512() && elementTypeIsSupported() && cpu.isIntel()) {
// use AVX-512 SIMD Array.sort
}
there should be:
if (hasAvx512() && elementTypeIsSupported && !cpu.isAmdZen4()) {
// use AVX-512 SIMD Array.sort
}
The answer for slow performance of AVX512 version of x86-simd-sort
(i.e. the one used to speed up Java's Array.sort) on Zen 4 is most
probably explained in AMD manuals which could be found at:
https://www.amd.com/en/search/documentation/hub.html#q=software%20optimization%20guide%20for%20the%20amd%20microarchitecture&f-amd_document_type=Software%20Optimization%20Guides
Software Optimization Guide for the AMD Zen4 Microarchitecture
(https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/software-optimization-guides/57647.zip)
has following remark in "2.11.2 Code recommendations" chapter:
> Avoid the memory destination form of COMPRESS instructions. These forms are implemented using microcode and achieve a lower store bandwidth than their register destination forms which use fastpath macro ops.
Software Optimization Guide for the AMD Zen5 Microarchitecture
(https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/software-optimization-guides/58455.zip)
doesn't have any remark about COMPRESS instructions.
Additionally:
The ticket about full fix, i.e.
https://bugs.openjdk.org/browse/JDK-8317976 Optimize SIMD sort for AMD
Zen 4, points to Reddit thread, which in turn points to deleted
(inaccessible) commit on GitHub. The commit was archived and the copy
is linked in https://github.com/intel/x86-simd-sort/issues/6#issuecomment-2506516404
Regards,
Piotr
More information about the discuss
mailing list