RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v2]

Tue Mar 25 00:32:17 UTC 2025

On Fri, 21 Mar 2025 20:33:47 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> This bugfix patch adds the special handling as per x86 AVX512-FP16 ISA specification[1][2] to compute max/min operations with +/-0.0 or NaN operands.
>> 
>> Special handling leverage the instruction semantic, central idea is to shuffle the operands such that smaller input gets assigned to second operand for min operation or a larger input gets assigned to second operand for max operation, in addition result equals NaN if an unordered comparison detects first input as a NaN value else we return the result of min/max operation. 
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>> 
>> [1] https://www.felixcloutier.com/x86/vminsh
>> [2] https://www.felixcloutier.com/x86/vmaxsh
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Minor cleanup

src/hotspot/cpu/x86/assembler_x86.cpp line 13758:

> 13756:   attributes.set_is_evex_instruction();
> 13757:   attributes.set_embedded_opmask_register_specifier(mask);
> 13758:   attributes.reset_is_clear_context();

Why do we do reset_is_clear_context here? We want kdst bits to be set/reset and no merge context.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7093:

> 7091: }
> 7092: 
> 7093: void C2_MacroAssembler::scalar_max_min_fp16(int opcode, XMMRegister dst, XMMRegister src1, XMMRegister src2,

Any reason we are not doing this on lines of scalar emit_fp_min_max? For most common cases emit_fp_min_max based sequence would have much better latency.

src/hotspot/cpu/x86/x86.ad line 1466:

> 1464:     case Op_MaxHF:
> 1465:     case Op_MinHF:
> 1466:       if (!VM_Version::supports_avx512bw()) {

This check should be supports_avx512vlbw(). 
The scalar_max_min_fp16 needs avx512vl as well.

src/hotspot/cpu/x86/x86.ad line 1469:

> 1467:         return false;
> 1468:       }
> 1469:     case Op_AddHF:

Please add a comment here indicating fall through.

test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 33:

> 31:  * @library /test/lib /
> 32:  * @summary Add special case handling for Float16.max/min x86 backend
> 33:  * @requires (os.simpleArch == "x64" & vm.cpu.features ~= ".*avx512_fp16.*" & vm.cpu.features ~= ".*avx512bw.*")

avx512vl is also needed here.

test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 57:

> 55: 
> 56:     @Run(test = "testMaxNaNOperands")
> 57:     @Warmup(1000)

Warmup could also be removed.

test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 59:

> 57:     @Warmup(1000)
> 58:     public void launchMaxNaNOperands() {
> 59:         for (int i = 0; i < 10000; i++) {

The loop could be removed throughout this test, don't need to test 10000 values.

test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 63:

> 61:             RES = testMaxNaNOperands(SRC, Float16.NaN);
> 62:             if (!RES.equals(Float16.NaN)) {
> 63:                 throw new AssertionError("input1 = NaN, input2 = " + SRC.floatValue() + ", expected = NaN, actual = " + RES.floatValue());

input1 is not NaN here.

test/hotspot/jtreg/compiler/intrinsics/float16/TestFloat16MaxMinSpecialValues.java line 94:

> 92:             RES = testMinNaNOperands(SRC, Float16.NaN);
> 93:             if (!RES.equals(Float16.NaN)) {
> 94:                 throw new AssertionError("input1 = NaN, input2 = " + SRC.floatValue() + ", expected = NaN, actual = " + RES.floatValue());

input1 is not NaN here.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2011090835
PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2010957028
PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2010941955
PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2010884350
PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2010958579
PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2011007302
PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2010960699
PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2010960003
PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2010961348