RFR: 8265491: Math Signum optimization for x86 [v8]
Sandhya Viswanathan
sviswanathan at openjdk.java.net
Tue Apr 27 18:15:40 UTC 2021
On Sat, 24 Apr 2021 00:21:49 GMT, Marcus G K Williams <github.com+168222+mgkwill at openjdk.org> wrote:
>> x86 Math.Signum() uses two floating point compares and a copy sign operation involving data movement to gpr and XMM.
>>
>> We can optimize to one floating point compare and sign computation in XMM. We observe ~25% performance improvement with this optimization.
>>
>> Base:
>>
>> Benchmark Mode Cnt Score Error Units
>> Signum._1_signumFloatTest avgt 5 4.660 ? 0.040 ns/op
>> Signum._2_overheadFloat avgt 5 3.314 ? 0.023 ns/op
>> Signum._3_signumDoubleTest avgt 5 4.809 ? 0.043 ns/op
>> Signum._4_overheadDouble avgt 5 3.313 ? 0.015 ns/op
>>
>>
>> Optimized:
>> signum intrinsic patch
>>
>> Benchmark Mode Cnt Score Error Units
>> Signum._1_signumFloatTest avgt 5 3.769 ? 0.015 ns/op
>> Signum._2_overheadFloat avgt 5 3.312 ? 0.025 ns/op
>> Signum._3_signumDoubleTest avgt 5 3.765 ? 0.005 ns/op
>> Signum._4_overheadDouble avgt 5 3.309 ? 0.010 ns/op
>>
>>
>> Signed-off-by: Marcus G K Williams <marcus.williams at intel.com>
>
> Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision:
>
> Fix copyright
>
> Signed-off-by: Marcus G K Williams <marcus.williams at intel.com>
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1065:
> 1063:
> 1064: jcc(Assembler::equal, DONE_LABEL);
> 1065: jcc(Assembler::parity, DONE_LABEL);
Please add comments here to explain that equal takes care of special case for +0.0/-0.0 and parity takes care of NaN.
If the argument is positive zero or negative zero, then the result is the same as the argument.
If the argument is NaN, then the result is NaN.
src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1076:
> 1074:
> 1075: if (opcode == Op_SignumF){
> 1076: xorps(dst, ExternalAddress(StubRoutines::x86::vector_float_sign_flip()), scratch);
The vector_float_sign_flip is 64 bit aligned. Whereas the sse version of xorps and xorpd will need 128 bit aligned memory address.
-------------
PR: https://git.openjdk.java.net/jdk/pull/3581
More information about the hotspot-compiler-dev
mailing list