RFR: 8371955: Support AVX10 floating point comparison instructions [v4]
Sandhya Viswanathan
sviswanathan at openjdk.org
Fri Jan 16 00:24:02 UTC 2026
On Fri, 9 Jan 2026 22:59:16 GMT, Mohamed Issa <missa at openjdk.org> wrote:
>> Intel® AVX10 ISA [1] extensions added new floating point comparison instructions. They set the EFLAGS register so that relationships can be tested independently to avoid extra checks when one of the inputs is NaN.
>>
>> Most of the work is covered in the architecture definition (`x86.ad`) file. A new comparison operand was created to be used by new CMove and JMP definitions with the APX specific portions of the CMove section being updated to rely on the new instructions because both sets of instructions are always expected to be available on the same platform. New floating point comparison definitions were also added.
>>
>> This change uses the new AVX10.2 (UCOMXSS or UCOMXSD) instructions on supported platforms to avoid the extra handling required with existing (UCOMISS or UCOMISD) instructions. To make sure no new failures were introduced, tier1, tier2, and tier3 tests were run on builds with and without the changes. Additionally, the JTREG tests listed below were used to verify correctness with `-XX:-UseAPX` / `-XX:+UseAPX` options. The baseline build used is [OpenJDK v26-b26](https://github.com/openjdk/jdk/releases/tag/jdk-26%2B26).
>>
>> 1. `jtreg:test/hotspot/jtreg/compiler/c2/irTests/CMoveLConstants.java`
>> 2. `jtreg:test/hotspot/jtreg/compiler/c2/irTests/TestFPComparison.java`
>> 3. `jtreg:test/hotspot/jtreg/compiler/intrinsics/math/TestSignumIntrinsic.java`
>> 4. `jtreg:test/hotspot/jtreg/compiler/vectorization/TestSignumVector.java`
>>
>> Finally, the JMH micro-benchmark listed below was updated to separately exercise CMove and JMP code paths.
>>
>> 1. `micro:test/micro/org/openjdk/bench/java/lang/FPComparison.java`
>>
>> [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10
>
> Mohamed Issa has updated the pull request incrementally with one additional commit since the last revision:
>
> Remove extra jump instruction in signum floating point and unify three-way floating point comparison logic in x86.ad
src/hotspot/cpu/x86/x86.ad line 1706:
> 1704: Label done;
> 1705: __ setcc(Assembler::above, dst);
> 1706: __ jcc(Assembler::aboveEqual, done);
It would be better to keep the original sequence with optimization:
__ movl(dst, -1);
__ jcc(Assembler::below, done);
__ setcc(Assembler::notEqual, dst);
This would work for both AVX10_2 and prior as the Assembler::below covers both unordered and less.
src/hotspot/cpu/x86/x86.ad line 9186:
> 9184: %}
> 9185:
> 9186: instruct cmovI_regUCFE(cmpOpUCFE cop, rFlagsRegUCFE cr, rRegI dst, rRegI src) %{
The following instructs could be removed and only the ndd version need to be kept:
cmovI_regUCFE
cmovI_memUCFE
cmovN_regUCFE
cmovP_regUCFE
cmovL_regUCFE
cmovL_memUCFE
cmpOpUCFE/rFlagsRegUCFE should only match when both AVX10_2 and APX is available otherwise fallback to cmpOpUCF/cmpOfUCF2/rFlagsRegUCF.
We can then remove the predicates from cmovI_reg/cmovI_regUCFE_ndd etc.
This is to avoid explosion of ad file instructs for various combination of APX and AVX10_2.
src/hotspot/cpu/x86/x86.ad line 14609:
> 14607: ins_encode %{
> 14608: __ ucomxss($src1$$XMMRegister, $src2$$XMMRegister);
> 14609: emit_cmpfp3(masm, $dst$$Register);
It looks to me that the only difference between cmpF_reg and cmpF_regE is ucomxss vs ucomiss. In which case it is better to keep only the original cmpF_reg and remove cmpF_regE. Likewise for cmpF_memE, cmpF_immE, cmpD_regE, cmpD_memE, cmpD_immE.
-------------
PR Review Comment: https://git.openjdk.org/jdk/pull/28337#discussion_r2696383810
PR Review Comment: https://git.openjdk.org/jdk/pull/28337#discussion_r2696362925
PR Review Comment: https://git.openjdk.org/jdk/pull/28337#discussion_r2684261529
More information about the core-libs-dev
mailing list