Array.sort should use AVX-512 SIMD sort on Zen 5 and later

David Holmes david.holmes at oracle.com
Fri Nov 29 02:24:19 UTC 2024


On 29/11/2024 3:19 am, Piotr Tarsa wrote:
> Hi,
> 
> I'm writing here because the hotspot mailing lists are full of
> GitHub's discussions and nothing else.

A hotspot mailing list is the place to discuss this. They may have a lot 
of Github discussion but that is not all. Certainly the discuss list is 
NOT the right place to discuss this.

David
-----

> Summary: instead of enabling AVX-512 SIMD sort on Intel CPUs only, the
> quick fix should be to disable AVX-512 SIMD sort on Zen 4 only (so
> keep it enabled on Zen 5 and future Zen CPUs).
> 
> Explanation:
> 
> In https://bugs.openjdk.org/browse/JDK-8317763 Follow-up to AVX512
> intrinsics for Arrays.sort() PR, one of main changes is:
>> 1) Restriction of the AVX512 sort acceleration to only Intel CPUs. A performance regression (due to micro-architectural differences) was reported for AMD Zen4 CPUs in the comments section of PR #14227.
> 
> That's too drastic. Instead, Zen 4 should be detected, i.e. instead of:
> 
> if (hasAvx512() && elementTypeIsSupported() && cpu.isIntel()) {
>     // use AVX-512 SIMD Array.sort
> }
> 
> there should be:
> 
> if (hasAvx512() && elementTypeIsSupported && !cpu.isAmdZen4()) {
>    // use AVX-512 SIMD Array.sort
> }
> 
> The answer for slow performance of AVX512 version of x86-simd-sort
> (i.e. the one used to speed up Java's Array.sort) on Zen 4 is most
> probably explained in AMD manuals which could be found at:
> https://www.amd.com/en/search/documentation/hub.html#q=software%20optimization%20guide%20for%20the%20amd%20microarchitecture&f-amd_document_type=Software%20Optimization%20Guides
> 
> Software Optimization Guide for the AMD Zen4 Microarchitecture
> (https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/software-optimization-guides/57647.zip)
> has following remark in "2.11.2 Code recommendations" chapter:
> 
>> Avoid the memory destination form of COMPRESS instructions. These forms are implemented using microcode and achieve a lower store bandwidth than their register destination forms which use fastpath macro ops.
> 
> Software Optimization Guide for the AMD Zen5 Microarchitecture
> (https://www.amd.com/content/dam/amd/en/documents/processor-tech-docs/software-optimization-guides/58455.zip)
> doesn't have any remark about COMPRESS instructions.
> 
> Additionally:
> The ticket about full fix, i.e.
> https://bugs.openjdk.org/browse/JDK-8317976 Optimize SIMD sort for AMD
> Zen 4, points to Reddit thread, which in turn points to deleted
> (inaccessible) commit on GitHub. The commit was archived and the copy
> is linked in https://github.com/intel/x86-simd-sort/issues/6#issuecomment-2506516404
> 
> 
> Regards,
> Piotr



More information about the discuss mailing list