RFR: 8360116: Add support for AVX10 floating point minmax instruction [v2]

Wed Jun 25 15:45:33 UTC 2025

On Wed, 25 Jun 2025 10:41:04 GMT, Jatin Bhateja <jbhateja at openjdk.org> wrote:

>> Intel@ AVX10 ISA [1] extensions added new floating point MIN/MAX instructions which comply with definitions in IEEE-754-2019 standard section 9.6 and can directly emulate Math.min/max semantics without the need for any special handling for NaN, +0.0 or -0.0 detection.
>> 
>> **The following pseudo-code describes the existing algorithm for min/max[FD]:**
>> 
>> Move the non-negative value to the second operand; this will ensure that we correctly handle 0.0 and -0.0 values, if values being compared are both 0.0s (of either sign), the value in the second operand (source operand) is returned. Existing MINPS and MAXPS semantics only check for NaN as the second operand; hence, we need special handling to check for NaN at the first operand.
>> 
>>   btmp = (b < +0.0) ? a : b
>>   atmp = (b < +0.0) ? b : a
>>   Tmp = Max_Float(atmp , btmp)
>>   Res = (atmp == NaN) ? atmp : Tmp
>> 
>> For min[FD] we need a small tweak in the above algorithm, i.e., move the non-negative value to the first operand, this will ensure that we correctly select -0.0 if both the operands being compared are 0.0 or -0.0.
>> 
>>   btmp = (b < +0.0) ? b : a
>>   atmp = (b < +0.0) ? a : b
>>   Tmp = Max_Float(atmp , btmp)
>>   Res = (atmp == NaN) ? atmp : Tmp
>> 
>> Thus, we need additional special handling for NaNs and +/-0.0 to compute floating-point min/max values to comply with the semantics of Math.max/min APIs using existing MINPS / MAXPS instructions. AVX10.2 added a new instruction, VPMINMAX[SH,SS,SD]/[PH,PS,PD], which comprehensively handles special cases, thereby eliminating the need for special handling.
>> 
>> Patch emits new instructions for reduction and non-reduction operations for single, double, and Float16 type.
>> 
>> Kindly review and share your feedback.
>> 
>> Best Regards,
>> Jatin
>> 
>> [1] https://www.intel.com/content/www/us/en/content-details/856721/intel-advanced-vector-extensions-10-2-intel-avx10-2-architecture-specification.html?wapkw=AVX10
>
> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update comments

Thank you for implementing these new instructions! I had a look at your changes and have a few minor suggestions and questions. I am quite new to this part of the codebase, so feel free to disagree if I am way off base.

How did you test these changes?

Also, if you merge the current master branch, the Windows build failures in the Github Actions will be fixed.

src/hotspot/cpu/x86/assembler_x86.cpp line 8693:

> 8691: }
> 8692: 
> 8693: 

Suggestion:

Nit: superfluous empty line

src/hotspot/cpu/x86/assembler_x86.cpp line 8785:

> 8783: void Assembler::evminmaxps(XMMRegister dst, KRegister mask, XMMRegister nds, XMMRegister src, bool merge, int imm8, int vector_len) {
> 8784:   assert(VM_Version::supports_avx10_2(), "");
> 8785:   InstructionAttr attributes(vector_len, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ false,/* uses_vl */ true);

Suggestion:

  InstructionAttr attributes(vector_len, /* vex_w */ false, /* legacy_mode */ false, /* no_mask_reg */ false, /* uses_vl */ true);

Nit: missing space

src/hotspot/cpu/x86/assembler_x86.hpp line 2752:

> 2750:   void eminmaxss(XMMRegister dst, XMMRegister nds, XMMRegister src, int imm8);
> 2751:   void eminmaxsd(XMMRegister dst, XMMRegister nds, XMMRegister src, int imm8);
> 2752:   void evminmaxph(XMMRegister dst, KRegister mask, XMMRegister nds, XMMRegister src, bool merge, int imm8, int vector_len);

Is there a reason `evminmaxph` does not have a version where `src` has type `Address`?

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1241:

> 1239: }
> 1240: 
> 1241: void C2_MacroAssembler::vminmax_fp(int opc, BasicType elem_bt, XMMRegister dst, KRegister mask,

Line 1122 mentions the differences between `vminps/vmaxps` and Java semantics. Perhaps a mention of the new instructions introduced in this PR might help people who are confused about the fact that `vminmax_fp` is overloaded.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1246:

> 1244:          opc == Op_MaxV || opc == Op_MaxReductionV, "sanity");
> 1245:   if (elem_bt == T_FLOAT) {
> 1246:     evminmaxps(dst, mask, src1, src2, true, opc == Op_MinV || opc == Op_MinReductionV ? 0x4 : 0x5, vlen_enc);

Perhaps `0x4` and `0x5` should be factored into named constants since they are used in multiple places and it would also help readability if one does not have the documentation handy when reading the code.

-------------

Changes requested by mhaessig (Committer).

PR Review: https://git.openjdk.org/jdk/pull/25914#pullrequestreview-2958407187
PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2166859511
PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2166896645
PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2167019420
PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2167008970
PR Review Comment: https://git.openjdk.org/jdk/pull/25914#discussion_r2166994321