RFR: 8265491: Math Signum optimization for x86 [v2]

Jie Fu jiefu at openjdk.java.net
Wed Apr 21 04:00:11 UTC 2021


On Tue, 20 Apr 2021 15:28:43 GMT, Marcus G K Williams <github.com+168222+mgkwill at openjdk.org> wrote:

>> x86 Math.Signum() uses two floating point compares and a copy sign operation involving data movement to gpr and XMM.
>> 
>> We can optimize to one floating point compare and sign computation in XMM. We observe ~25% performance improvement with this optimization.
>> 
>> Base:
>> Benchmark Mode Cnt Score Error Units
>> Signum._1_signumFloatTest avgt 5 4.660 ? 0.040 ns/op
>> Signum._2_overheadFloat avgt 5 3.314 ? 0.023 ns/op
>> Signum._3_signumDoubleTest avgt 5 4.809 ? 0.043 ns/op
>> Signum._4_overheadDouble avgt 5 3.313 ? 0.015 ns/op
>>  
>> Optimized:
>> signum intrinsic patch
>> Benchmark Mode Cnt Score Error Units
>> Signum._1_signumFloatTest avgt 5 3.782 ? 0.019 ns/op
>> Signum._2_overheadFloat avgt 5 3.309 ? 0.011 ns/op
>> Signum._3_signumDoubleTest avgt 5 3.782 ? 0.017 ns/op
>> Signum._4_overheadDouble avgt 5 3.310 ? 0.006 ns/op
>> 
>> Signed-off-by: Marcus G K Williams <marcus.williams at intel.com>
>
> Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Add os.arch=="i386" to signum jtreg
>   
>   Signed-off-by: Marcus G K Williams <marcus.williams at intel.com>

src/hotspot/cpu/x86/x86.ad line 5780:

> 5778: // --------------------------------- Signum ---------------------------
> 5779: 
> 5780: instruct signumF_reg(regF dst, regF zero, regF one, rFlagsReg cr) %{

Do we need `predicate(UseSSE>=2);` here?

src/hotspot/cpu/x86/x86.ad line 5788:

> 5786: 
> 5787:     __ ucomiss($dst$$XMMRegister, $zero$$XMMRegister);
> 5788:     __ jcc(Assembler::parity, exit);

How about checking equal first and then parity?

I think the unordered case is rare in real programs.

src/hotspot/cpu/x86/x86.ad line 5792:

> 5790:     __ movflt($dst$$XMMRegister, $one$$XMMRegister);
> 5791:     __ jcc(Assembler::above, exit);
> 5792:     __ movflt($dst$$XMMRegister, $zero$$XMMRegister);

Is it possible to use just one instruction to assign -1 to $dst?

Maybe, you can try to follow negF_reg/negF_reg_reg.

-------------

PR: https://git.openjdk.java.net/jdk/pull/3581


More information about the hotspot-compiler-dev mailing list