Array.sort should use AVX-512 SIMD sort on Zen 5 and later
David Holmes
david.holmes at oracle.com
Fri Nov 29 02:24:19 UTC 2024
On 29/11/2024 3:19 am, Piotr Tarsa wrote:
> Hi,
>
> I'm writing here because the hotspot mailing lists are full of
> GitHub's discussions and nothing else.
A hotspot mailing list is the place to discuss this. They may have a lot
of Github discussion but that is not all. Certainly the discuss list is
NOT the right place to discuss this.
David
-----
> Summary: instead of enabling AVX-512 SIMD sort on Intel CPUs only, the
> quick fix should be to disable AVX-512 SIMD sort on Zen 4 only (so
> keep it enabled on Zen 5 and future Zen CPUs).
>
> Explanation:
>
> In https://bugs.openjdk.org/browse/JDK-8317763 Follow-up to AVX512
> intrinsics for Arrays.sort() PR, one of main changes is:
>> 1) Restriction of the AVX512 sort acceleration to only Intel CPUs. A performance regression (due to micro-architectural differences) was reported for AMD Zen4 CPUs in the comments section of PR #14227.
>
> That's too drastic. Instead, Zen 4 should be detected, i.e. instead of:
>
> if (hasAvx512() && elementTypeIsSupported() && cpu.isIntel()) {
> // use AVX-512 SIMD Array.sort
> }
>
> there should be:
>
> if (hasAvx512() && elementTypeIsSupported && !cpu.isAmdZen4()) {
> // use AVX-512 SIMD Array.sort
> }
>
> The answer for slow performance of AVX512 version of x86-simd-sort
> (i.e. the one used to speed up Java's Array.sort) on Zen 4 is most
> probably explained in AMD manuals which could be found at:
> https://www.amd.com/en/search/documentation/hub.html#q=software%20optimization%20guide%20for%20the%20amd%20microarchitecture&f-amd_document_type=Software%20Optimization%20Guides
>
> Software Optimization Guide for the AMD Zen4 Microarchitecture
> (https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/software-optimization-guides/57647.zip)
> has following remark in "2.11.2 Code recommendations" chapter:
>
>> Avoid the memory destination form of COMPRESS instructions. These forms are implemented using microcode and achieve a lower store bandwidth than their register destination forms which use fastpath macro ops.
>
> Software Optimization Guide for the AMD Zen5 Microarchitecture
> (https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/software-optimization-guides/58455.zip)
> doesn't have any remark about COMPRESS instructions.
>
> Additionally:
> The ticket about full fix, i.e.
> https://bugs.openjdk.org/browse/JDK-8317976 Optimize SIMD sort for AMD
> Zen 4, points to Reddit thread, which in turn points to deleted
> (inaccessible) commit on GitHub. The commit was archived and the copy
> is linked in https://github.com/intel/x86-simd-sort/issues/6#issuecomment-2506516404
>
>
> Regards,
> Piotr
More information about the discuss
mailing list