RFR: 8285868: x86_64 intrinsics for floating point methods isNaN, isFinite and isInfinite [v4]

Wed May 18 02:26:55 UTC 2022

On Tue, 17 May 2022 22:12:38 GMT, Srinivas Vamsi Parasa <duke at openjdk.java.net> wrote:

>> We develop optimized x86_64 intrinsics for the floating point class check methods isNaN(), isFinite() and IsInfinite() for Float and Double classes. JMH benchmarks show ~8x improvement for isNan(), ~3x improvement for isInfinite() and 15% gain for isFinite().
>> 
>> 
>> JMH Benchmark (ns/op)	              Baseline	  This PR (WITH vfpclassss/sd)                      Speedup
>> 		                               
>> FloatClassCheck.testIsFinite	      0.559	                  0.4	                            1.4x
>> FloatClassCheck.testIsInfinite	      0.828	                  0.386	                            2.15x
>> FloatClassCheck.testIsNaN	      2.589	                  0.387	                            6.7x
>> DoubleClassCheck.testIsFinite         0.568	                  0.414	                            1.37x
>> DoubleClassCheck.testIsInfinite       0.836	                  0.395	                            2.11x
>> DoubleClassCheck.testIsNaN	      2.592	                  0.393	                            6.6x
>> 
>> JMH Benchmark (ns/op)	              Baseline	  This PR (WITHOUT vfpclassss/sd)                   Speedup
>> FloatClassCheck.testIsFinite	      0.561	                 0.468	                             1.2x
>> FloatClassCheck.testIsInfinite	      0.793	                 0.491	                             1.61x
>> FloatClassCheck.testIsNaN	      2.587	                 0.469	                             5.5x
>> DoubleClassCheck.testIsFinite         0.561	                 0.592	                             0.94x
>> DoubleClassCheck.testIsInfinite       0.828	                 0.592	                             1.4x
>> DoubleClassCheck.testIsNaN	      2.593	                 0.594	                             4.4x
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Split the macros using predicate

Hi, I'm working on #8525 which also improves the performance of these methods. With that patch, `isNaN` is reduced to the optimal sequence `ucomiss x, x; jp label`. This patch still benefits the performance of `isFinite` and `isInfinite` for float cases. For double cases without `vfpclass`, I'm not sure due to the materialisation of long constants, though.

Also, can we output the result of the intrinsics directly in the flag registers? Thanks.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4753:

> 4751:   switch (opcode) {
> 4752:     case Op_IsFiniteF:
> 4753:       setb(Assembler::below, dst);

This partial write may stall later reads on `dst`, you could emit a `xor dst, dst` before doing the comparison.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4785:

> 4783:   kmovbl(dst, tmp);
> 4784:   if (opcode == Op_IsFiniteF) {
> 4785:     xorl(dst, 0x00000001);

`notl(dst)`?

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4800:

> 4798:   mov64(temp1, KILL_SIGN_MASK);
> 4799:   andq(temp, temp1);
> 4800:   mov64(temp2, POS_INF);

Can we use `temp1` for this, too?

-------------

PR: https://git.openjdk.java.net/jdk/pull/8459