RFR: 8352675: Support Intel AVX10 converged vector ISA feature detection [v9]

Fri May 2 20:54:47 UTC 2025

On Fri, 2 May 2025 11:31:01 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> - Intel AVX10[1] extends and enhances the capabilities of Intel AVX-512 to benefit all Intel® products and will be the vector ISA of choice moving into the future. 
>> - It supports a new ISA versioning scheme which simplifies the existing AVX512 feature enumeration scheme. Feature set supported by an AVX10 ISA version will be supported by all the versions above it.
>> - The initial, fully-featured version of Intel® AVX10 will be enumerated as Version 2 (denoted as Intel® AVX10.2). This will include the new ISA extension over the existing AVX512 instructions. 
>> - An early version of Intel® AVX10 (Version 1, or Intel® AVX10.1) that only enumerates the Intel® AVX-512 instruction set at 128, 256, and 512 bits will be enabled on the Granite Rapids Server for software pre-enabling.
>> 
>> This patch adds the necessary CPUID feature detection for AVX10 ISA version 1 and 2.  In terms of architectural state save restoration, AVX10 is isomorphic to AVX512 support up till Granite Rapids. State components affected by AVX10 extension include SSE, AVX, Opmask, ZMM_Hi256, and Hi16_ZMM registers. 
>> 
>> The patch has been regressed through tier1 and jvmci tests 
>> 
>> Please review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>> 
>> [1] https://www.intel.com/content/www/us/en/content-details/844829/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html
>
> Jatin Bhateja has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
> 
>   Refactoring code to create a seperate VM_Features class

src/hotspot/cpu/x86/vm_version_x86.cpp line 464:

> 462:     __ movl(rcx, 0x18000000); // cpuid1 bits osxsave | avx
> 463:     __ andl(rcx, Address(rsi, 8)); // cpuid1 bits osxsave | avx
> 464:     __ jccb(Assembler::equal, done); // jump if AVX is not supported

This doesn't not have same effect as before. Consider input is 0x10000000, the andl result will not be zero with this code and so jump to done will not happen. Whereas prior to this change, the cmpl with 0x18000000 will fail for equality and so a jump to done will happen.  This is the case for all the places where we are checking more than 1 set bit.

src/hotspot/cpu/x86/vm_version_x86.cpp line 468:

> 466:     __ movl(rax, 0x6);
> 467:     __ andl(rax, Address(rbp, in_bytes(VM_Version::xem_xcr0_offset()))); // xcr0 bits sse | ymm
> 468:     __ jccb(Assembler::notEqual, start_simd_check); // return if AVX is not supported

See prior comment, need the cmpl and jmp here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072134109
PR Review Comment: https://git.openjdk.org/jdk/pull/24329#discussion_r2072136639