RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v7]
Ziviani
github.com+670087+jrziviani at openjdk.java.net
Mon Nov 16 21:55:07 UTC 2020
On Mon, 16 Nov 2020 20:46:27 GMT, Corey Ashford <github.com+51754783+CoreyAshford at openjdk.org> wrote:
>> Ziviani has refreshed the contents of this pull request, and previous commits have been removed. The incremental views will show differences compared to the previous content of the PR. The pull request contains one new commit since the last revision:
>>
>> 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions
>>
>> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0.
>> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0.
>> Ref: PowerISA 3.1, page 129.
>>
>> These instructions are particularly interesting to improve the following
>> pattern `(src1<src2)? -1: ((src1>src2)? 1: 0)`, which can be found in
>> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches.
>>
>> Long.toString, that generate such pattern in getChars, has showed a
>> good performance gain by using these new instructions.
>>
>> Example:
>> for (int i = 0; i < 200_000; i++)
>> res = Long.toString((long)i);
>>
>> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString
>>
>> Without setbc (average): 0.1178 seconds
>> With setbc (average): 0.0396 seconds
>
> src/hotspot/cpu/ppc/ppc.ad line 11425:
>
>> 11423: match(Set dst (CmpL3 src1 src2));
>> 11424: effect(KILL cr0);
>> 11425: ins_cost(DEFAULT_COST * 5);
>
> Should this depend on P10 vs. P9 since the instruction cost changes by 1 ?
As per Martin:
> "size" needs to be precise, but a rough estimate is sufficient for "ins_const". In this case CmpL3 has only one match rule, so matcher doesn't have a choice and cost is pointless. So I suggest to keep it more simple and make cost independent on has_brw.
> src/hotspot/cpu/ppc/ppc.ad line 11760:
>
>> 11758: match(Set dst (CmpF3 src1 src2));
>> 11759: effect(KILL cr0);
>> 11760: ins_cost(DEFAULT_COST * 6);
>
> Should this depend on P10 vs. P9 because of the different number of instructions needed? Maybe an approx. value is enough when other paths can't come close to competing.
same as above
> src/hotspot/cpu/ppc/ppc.ad line 11844:
>
>> 11842: match(Set dst (CmpD3 src1 src2));
>> 11843: effect(KILL cr0);
>> 11844: ins_cost(DEFAULT_COST * 6);
>
> Same question here about P10 vs. P9 regarding cost
same as above
-------------
PR: https://git.openjdk.java.net/jdk/pull/907
More information about the hotspot-dev
mailing list