RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v4]

Martin Doerr mdoerr at openjdk.java.net
Wed Nov 4 11:10:59 UTC 2020


On Wed, 4 Nov 2020 03:14:09 GMT, Ziviani <github.com+670087+jrziviani at openjdk.org> wrote:

>> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0.
>> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0.
>> Ref: PowerISA 3.1, page 129.
>> 
>> These instructions are particularly interesting to improve the following
>> pattern `(src1<src2)? -1: ((src1>src2)? 1: 0)`, which can be found in
>> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches.
>> 
>> Long.toString, that generate such pattern in getChars, has showed a
>> good performance gain by using these new instructions.
>> 
>> Example:
>> for (int i = 0; i < 200_000; i++)
>>   res = Long.toString((long)i);
>> 
>> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString
>> 
>> Without setbc (average): 0.1178 seconds
>> With setbc (average): 0.0396 seconds
>
> Ziviani has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision:
> 
>   8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions
>   
>   - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0.
>   - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0.
>   Ref: PowerISA 3.1, page 129.
>   
>   These instructions are particularly interesting to improve the following
>   pattern `(src1<src2)? -1: ((src1>src2)? 1: 0)`, which can be found in
>   `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches.
>   
>   Long.toString, that generate such pattern in getChars, has showed a
>   good performance gain by using these new instructions.
>   
>   Example:
>   for (int i = 0; i < 200_000; i++)
>     res = Long.toString((long)i);
>   
>   java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString
>   
>   Without setbc (average): 0.1178 seconds
>   With setbc (average): 0.0396 seconds

Nice work! Looks correct and much cleaner than before. I only have a few improvement requests left.

src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 240:

> 238: 
> 239: // set dst to -1, 0, +1
> 240: inline void MacroAssembler::set_cmp3(Register dst) {

Please add assert_different_registers(dst, R0);

src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 241:

> 239: // set dst to -1, 0, +1
> 240: inline void MacroAssembler::set_cmp3(Register dst) {
> 241:     // P10, prefer using setbc intructions

Please adapt style: 2 leading spaces in C++ code

src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 246:

> 244:       setnbc(dst, CCR0, Assembler::less);
> 245:     }
> 246:     else {

Please adapt style: } else {

src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 255:

> 253: 
> 254: // set dst to -1, 0, +1
> 255: inline void MacroAssembler::set_cmpu3(Register dst) {

Shorter possiblity with only 1 additional instruction on any Power version:
cror(CCR0, Assembler::less, CCR0, Assembler::summary_overflow); // treat overflow like less
set_cmp3(dst);

src/hotspot/cpu/ppc/ppc.ad line 11424:

> 11422: instruct cmpL3_reg_reg(iRegIdst dst, iRegLsrc src1, iRegLsrc src2) %{
> 11423:   match(Set dst (CmpL3 src1 src2));
> 11424:   ins_cost(DEFAULT_COST * (VM_Version::has_brw() ? 4 : 5));

"size" needs to be precise, but a rough estimate is sufficient for "ins_const". In this case CmpL3 has only one match rule, so matcher doesn't have a choice and cost is pointless. So I suggest to keep it more simple and make cost independent on has_brw.

-------------

Changes requested by mdoerr (Reviewer).

PR: https://git.openjdk.java.net/jdk/pull/907


More information about the hotspot-dev mailing list