RFR: 8285868: x86_64 intrinsics for floating point methods isNaN, isFinite and isInfinite [v4]
Quan Anh Mai
duke at openjdk.java.net
Wed May 18 02:26:55 UTC 2022
On Tue, 17 May 2022 22:12:38 GMT, Srinivas Vamsi Parasa <duke at openjdk.java.net> wrote:
>> We develop optimized x86_64 intrinsics for the floating point class check methods isNaN(), isFinite() and IsInfinite() for Float and Double classes. JMH benchmarks show ~8x improvement for isNan(), ~3x improvement for isInfinite() and 15% gain for isFinite().
>>
>>
>> JMH Benchmark (ns/op) Baseline This PR (WITH vfpclassss/sd) Speedup
>>
>> FloatClassCheck.testIsFinite 0.559 0.4 1.4x
>> FloatClassCheck.testIsInfinite 0.828 0.386 2.15x
>> FloatClassCheck.testIsNaN 2.589 0.387 6.7x
>> DoubleClassCheck.testIsFinite 0.568 0.414 1.37x
>> DoubleClassCheck.testIsInfinite 0.836 0.395 2.11x
>> DoubleClassCheck.testIsNaN 2.592 0.393 6.6x
>>
>> JMH Benchmark (ns/op) Baseline This PR (WITHOUT vfpclassss/sd) Speedup
>> FloatClassCheck.testIsFinite 0.561 0.468 1.2x
>> FloatClassCheck.testIsInfinite 0.793 0.491 1.61x
>> FloatClassCheck.testIsNaN 2.587 0.469 5.5x
>> DoubleClassCheck.testIsFinite 0.561 0.592 0.94x
>> DoubleClassCheck.testIsInfinite 0.828 0.592 1.4x
>> DoubleClassCheck.testIsNaN 2.593 0.594 4.4x
>
> Srinivas Vamsi Parasa has updated the pull request incrementally with one additional commit since the last revision:
>
> Split the macros using predicate
Hi, I'm working on #8525 which also improves the performance of these methods. With that patch, `isNaN` is reduced to the optimal sequence `ucomiss x, x; jp label`. This patch still benefits the performance of `isFinite` and `isInfinite` for float cases. For double cases without `vfpclass`, I'm not sure due to the materialisation of long constants, though.
Also, can we output the result of the intrinsics directly in the flag registers? Thanks.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4753:
> 4751: switch (opcode) {
> 4752: case Op_IsFiniteF:
> 4753: setb(Assembler::below, dst);
This partial write may stall later reads on `dst`, you could emit a `xor dst, dst` before doing the comparison.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4785:
> 4783: kmovbl(dst, tmp);
> 4784: if (opcode == Op_IsFiniteF) {
> 4785: xorl(dst, 0x00000001);
`notl(dst)`?
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4800:
> 4798: mov64(temp1, KILL_SIGN_MASK);
> 4799: andq(temp, temp1);
> 4800: mov64(temp2, POS_INF);
Can we use `temp1` for this, too?
-------------
PR: https://git.openjdk.java.net/jdk/pull/8459
More information about the hotspot-compiler-dev
mailing list