RFR: 8271883: Math CopySign optimization for x86 [v2]

Fri Aug 6 01:32:36 UTC 2021

On Wed, 4 Aug 2021 21:42:47 GMT, Marcus G K Williams <mgkwill at openjdk.org> wrote:

>> Intrinsic for Math.copySign is disabled on x86_64.
>> 
>> We can improve on generated c2 instructions for float and double, and this change adds optimized intrinsics for float and double Math.copySign.
>> 
>> ### **Math.copySign(double)**
>> _From:_
>>   0x00007f7d606e5dac:   vmovq  %xmm1,%r10
>>   0x00007f7d606e5db1:   vmovq  %xmm0,%r11
>>   0x00007f7d606e5db6:   movabs $0x7fffffffffffffff,%r8
>>   0x00007f7d606e5dc0:   and    %r8,%r11
>>   0x00007f7d606e5dc3:   movabs $0x8000000000000000,%r8
>>   0x00007f7d606e5dcd:   and    %r8,%r10
>>   0x00007f7d606e5dd0:   or     %r11,%r10
>>   0x00007f7d606e5dd3:   vmovq  %r10,%xmm0                   
>> 
>> _To:_
>>   0x00007fc3c14c63ac:   movabs $0x7fffffffffffffff,%r10
>>   0x00007fc3c14c63b6:   vmovq  %r10,%xmm2
>>   0x00007fc3c14c63bb:   vpternlogq $0xe4,%xmm2,%xmm1,%xmm0
>> 
>> ### **Math.copySign(float)**
>> _From:_
>>   0x00007ff8886e60ac:   vmovd  %xmm1,%r11d
>>   0x00007ff8886e60b1:   vmovd  %xmm0,%r10d
>>   0x00007ff8886e60b6:   and    $0x80000000,%r11d
>>   0x00007ff8886e60bd:   and    $0x7fffffff,%r10d
>>   0x00007ff8886e60c4:   or     %r10d,%r11d
>>   0x00007ff8886e60c7:   vmovd  %r11d,%xmm0
>> _To:_
>>   0x00007fc7d94c63ac:   mov    $0x7fffffff,%r10d
>>   0x00007fc7d94c63b2:   vmovd  %r10d,%xmm3
>>   0x00007fc7d94c63b7:   vpternlogd $0xe4,%xmm3,%xmm1,%xmm0
>> 
>> #### _**Performance of patch using updated test/micro/org/openjdk/bench/vm/compiler/Signum.java:**_
>> #### **BEFORE**
>> Signum._5_copySignFloatTest       avgt    5  2.442 ? 0.024  ns/op
>> Signum._7_copySignDoubleTest      avgt    5  2.400 ? 0.033  ns/op
>> 
>> #### **PATCH**
>> Signum._5_copySignFloatTest       avgt    5  2.029 ? 0.011  ns/op
>> Signum._7_copySignDoubleTest      avgt    5  2.029 ? 0.024  ns/op
>> 
>> #### **JTREG that covers this case:**
>> test/hotspot/jtreg/compiler/intrinsics/math/TestSignumIntrinsic.java
>> 
>> Signed-off-by: Marcus G K Williams <marcus.williams at intel.com>
>
> Marcus G K Williams has updated the pull request incrementally with one additional commit since the last revision:
> 
>   Update x86.ad with #ifdef _LP64 for copySign

src/hotspot/cpu/x86/x86.ad line 1692:

> 1690:         return false;
> 1691:       }
> 1692:       break;

This should be part of match_rule_supported() method and not match_rule_supported_vector().
Also should return false for non LP_64.

src/hotspot/cpu/x86/x86.ad line 5810:

> 5808: 
> 5809: #ifdef _LP64
> 5810: instruct copySignF_reg(regF dst, regF src, regF tmp1, rRegL tmp2) %{

rRegL tmp2 should be rRegI tmp2 for copySignF.

src/hotspot/cpu/x86/x86.ad line 5814:

> 5812:   match(Set dst (CopySignF dst src));
> 5813:   effect(TEMP tmp1, TEMP tmp2);
> 5814:   format %{ "CopySignF $dst, $src" %}

Please add the tmp registers to the format, something like below:
format %{ "CopySignF $dst, $src\t! using $tmp1 and $tmp2 as TEMP" %}

src/hotspot/cpu/x86/x86.ad line 5828:

> 5826:   ins_cost(125);
> 5827:   effect(TEMP tmp1, TEMP tmp2);
> 5828:   format %{ "CopySignD $dst, $src" %}

Please add the tmp1 and tmp2 registers to the format.

src/hotspot/cpu/x86/x86.ad line 5842:

> 5840:   ins_cost(100);
> 5841:   effect(TEMP tmp1, TEMP tmp2);
> 5842:   format %{ "CopySignD $dst, $src" %}

Please add the tmp1 and tmp2 registers to the format.

-------------

PR: https://git.openjdk.java.net/jdk/pull/5005