RFR: 8352585: Add special case handling for Float16.max/min x86 backend [v2]

Jatin Bhateja jbhateja at openjdk.org
Tue Mar 25 08:34:27 UTC 2025


On Tue, 25 Mar 2025 00:16:14 GMT, Sandhya Viswanathan <sviswanathan at openjdk.org> wrote:

>> Jatin Bhateja has updated the pull request incrementally with one additional commit since the last revision:
>> 
>>   Minor cleanup
>
> src/hotspot/cpu/x86/assembler_x86.cpp line 13758:
> 
>> 13756:   attributes.set_is_evex_instruction();
>> 13757:   attributes.set_embedded_opmask_register_specifier(mask);
>> 13758:   attributes.reset_is_clear_context();
> 
> Why do we do reset_is_clear_context here? We want kdst bits to be set/reset and no merge context.

Actually, its not relevant in this case. EVEX.Z bit is used to select b/w merging and zeroing semantics w.r.t to vector destination. for opmask destination we always set the [bits corresponding to masked out lanes to zero](https://www.felixcloutier.com/x86/vcmpph#:~:text=CMP_OPERATOR%20tsrc2%0A%20%20%20%20ELSE-,DEST.bit%5Bj%5D%20%3A%3D%200,-DEST%5BMAXKL%2D1)

> src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 7093:
> 
>> 7091: }
>> 7092: 
>> 7093: void C2_MacroAssembler::scalar_max_min_fp16(int opcode, XMMRegister dst, XMMRegister src1, XMMRegister src2,
> 
> Any reason we are not doing this on lines of scalar emit_fp_min_max? For most common cases emit_fp_min_max based sequence would have much better latency.

We don't need any blend emulation on CPUs supporting AVX512-FP16, it's specific to E-core targets.

-------------

PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2011566840
PR Review Comment: https://git.openjdk.org/jdk/pull/24169#discussion_r2011566750


More information about the hotspot-compiler-dev mailing list