RFR: 8285868: x86_64 intrinsics for floating point methods isNaN, isFinite and isInfinite [v8]

Thu May 19 02:22:36 UTC 2022

On Thu, 19 May 2022 00:06:00 GMT, Srinivas Vamsi Parasa <duke at openjdk.java.net> wrote:

>> We develop optimized x86_64 intrinsics for the floating point class check methods `isNaN()`, `isFinite()` and `IsInfinite()` for Float and Double classes. JMH benchmarks show ~6x improvement for `isNan()`, ~2x improvement for `isInfinite()` and 40% gain for `isFinite()` using` vfpclasss(s/d)` instructions.
>> 
>> 
>> JMH Benchmark (ns/op)	              Baseline	  This PR (WITH vfpclassss/sd)                      Speedup
>> 		                               
>> FloatClassCheck.testIsFinite	      0.559	                  0.4	                            1.4x
>> FloatClassCheck.testIsInfinite	      0.828	                  0.386	                            2.15x
>> FloatClassCheck.testIsNaN	      2.589	                  0.387	                            6.7x
>> DoubleClassCheck.testIsFinite         0.568	                  0.414	                            1.37x
>> DoubleClassCheck.testIsInfinite       0.836	                  0.395	                            2.11x
>> DoubleClassCheck.testIsNaN	      2.592	                  0.393	                            6.6x
>> 
>> JMH Benchmark (ns/op)	              Baseline	  This PR (WITHOUT vfpclassss/sd)                   Speedup
>> FloatClassCheck.testIsFinite	      0.561	                 0.468	                             1.2x
>> FloatClassCheck.testIsInfinite	      0.793	                 0.491	                             1.61x
>> FloatClassCheck.testIsNaN	      2.587	                 0.469	                             5.5x
>> DoubleClassCheck.testIsFinite         0.561	                 0.592	                             0.94x
>> DoubleClassCheck.testIsInfinite       0.828	                 0.592	                             1.4x
>> DoubleClassCheck.testIsNaN	      2.593	                 0.594	                             4.4x
>
> Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits:
> 
>  - add comment for vfpclasss/d for isFinite()
>  - Merge branch 'master' of https://git.openjdk.java.net/jdk into float
>  - zero out the upper bits not written by setb
>  - use 0x1 to be simpler
>  - remove the redundant temp register
>  - Split the macros using predicate
>  - update jmh tests
>  - Merge branch 'master' into float
>  - 8285868: x86_64 intrinsics for floating point methods isNaN, isFinite and isInfinite

Please remove `isNaN` intrinsics in favour of #8525 .

Also, you should not use `andl(dst, 0xff)` to zero out the upper bits of `dst` since it is a 32-bit read following a 8-bit write, constitute a partial register stall

Refer to section 3.5.2.4, Partial register stalls from Intel® 64 and IA-32 Architectures Optimization Reference Manual:

> A partial register stall happens when an instruction refers to a register, portions of which were previously modified by other instructions.

There are 2 options worth considering here

- Zeroing the register before `setb` instruction, referring to the same section

    > For optimal performance, use of zero idioms, before the use of the register, eliminates the need for partial register 
    merge micro-ops

    This is more preferable since it does not contribute an execution uop in the backend (but still consumes a slot in the 
    decoder and uop cache)

- Zero extending the register after the `setb` instruction. This is less optimal since it has an extra latency of zero extension and adding a real executed uop in the backend.

Thanks.

-------------

PR: https://git.openjdk.java.net/jdk/pull/8459