RFR: 8285868: x86_64 intrinsics for floating point methods isNaN, isFinite and isInfinite [v8]
Jatin Bhateja
jbhateja at openjdk.java.net
Thu May 19 07:12:50 UTC 2022
On Thu, 19 May 2022 00:06:00 GMT, Srinivas Vamsi Parasa <duke at openjdk.java.net> wrote:
>> We develop optimized x86_64 intrinsics for the floating point class check methods `isNaN()`, `isFinite()` and `IsInfinite()` for Float and Double classes. JMH benchmarks show ~6x improvement for `isNan()`, ~2x improvement for `isInfinite()` and 40% gain for `isFinite()` using` vfpclasss(s/d)` instructions.
>>
>>
>> JMH Benchmark (ns/op) Baseline This PR (WITH vfpclassss/sd) Speedup
>>
>> FloatClassCheck.testIsFinite 0.559 0.4 1.4x
>> FloatClassCheck.testIsInfinite 0.828 0.386 2.15x
>> FloatClassCheck.testIsNaN 2.589 0.387 6.7x
>> DoubleClassCheck.testIsFinite 0.568 0.414 1.37x
>> DoubleClassCheck.testIsInfinite 0.836 0.395 2.11x
>> DoubleClassCheck.testIsNaN 2.592 0.393 6.6x
>>
>> JMH Benchmark (ns/op) Baseline This PR (WITHOUT vfpclassss/sd) Speedup
>> FloatClassCheck.testIsFinite 0.561 0.468 1.2x
>> FloatClassCheck.testIsInfinite 0.793 0.491 1.61x
>> FloatClassCheck.testIsNaN 2.587 0.469 5.5x
>> DoubleClassCheck.testIsFinite 0.561 0.592 0.94x
>> DoubleClassCheck.testIsInfinite 0.828 0.592 1.4x
>> DoubleClassCheck.testIsNaN 2.593 0.594 4.4x
>
> Srinivas Vamsi Parasa has updated the pull request with a new target base due to a merge or a rebase. The pull request now contains ten commits:
>
> - add comment for vfpclasss/d for isFinite()
> - Merge branch 'master' of https://git.openjdk.java.net/jdk into float
> - zero out the upper bits not written by setb
> - use 0x1 to be simpler
> - remove the redundant temp register
> - Split the macros using predicate
> - update jmh tests
> - Merge branch 'master' into float
> - 8285868: x86_64 intrinsics for floating point methods isNaN, isFinite and isInfinite
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 4750:
> 4748: movdl(temp, src);
> 4749: andl(temp, KILL_SIGN_MASK);
> 4750: cmpl(temp, POS_INF);
For IsNaN following sequence will offer better latency
"vucomiss src_xmm, src_xmm"
"setp r8"
-------------
PR: https://git.openjdk.java.net/jdk/pull/8459
More information about the hotspot-compiler-dev
mailing list