RFR: 8265491: Math Signum optimization for x86 [v8]

Tue Apr 27 18:15:40 UTC 2021

On Sat, 24 Apr 2021 00:21:49 GMT, Marcus G K Williams <github.com+168222+mgkwill at openjdk.org> wrote:

>> x86 Math.Signum() uses two floating point compares and a copy sign operation involving data movement to gpr and XMM.
>> 
>> We can optimize to one floating point compare and sign computation in XMM. We observe ~25% performance improvement with this optimization.
>> 
>> Base:
>> 
>> Benchmark                       Mode Cnt Score Error Units
>> Signum._1_signumFloatTest avgt 5 4.660 ? 0.040 ns/op
>> Signum._2_overheadFloat avgt 5 3.314 ? 0.023 ns/op
>> Signum._3_signumDoubleTest avgt 5 4.809 ? 0.043 ns/op
>> Signum._4_overheadDouble avgt 5 3.313 ? 0.015 ns/op
>> 
>>  
>> Optimized:
>> signum intrinsic patch
>> 
>> Benchmark                       Mode  Cnt  Score   Error  Units
>> Signum._1_signumFloatTest       avgt    5  3.769 ? 0.015  ns/op
>> Signum._2_overheadFloat         avgt    5  3.312 ? 0.025  ns/op
>> Signum._3_signumDoubleTest      avgt    5  3.765 ? 0.005  ns/op
>> Signum._4_overheadDouble        avgt    5  3.309 ? 0.010  ns/op
>> 
>> 
>> Signed-off-by: Marcus G K Williams <marcus.williams at intel.com>
>
> Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Fix copyright
>   
>   Signed-off-by: Marcus G K Williams <marcus.williams at intel.com>

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1065:

> 1063: 
> 1064:   jcc(Assembler::equal, DONE_LABEL);
> 1065:   jcc(Assembler::parity, DONE_LABEL);

Please add comments here to explain that equal takes care of special case for +0.0/-0.0 and parity takes care of NaN.
If the argument is positive zero or negative zero, then the result is the same as the argument.
If the argument is NaN, then the result is NaN.

src/hotspot/cpu/x86/c2_MacroAssembler_x86.cpp line 1076:

> 1074: 
> 1075:   if (opcode == Op_SignumF){
> 1076:     xorps(dst, ExternalAddress(StubRoutines::x86::vector_float_sign_flip()), scratch);

The vector_float_sign_flip is 64 bit aligned. Whereas the sse version of xorps and xorpd will need 128 bit aligned memory address.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3581