RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v4]
Martin Doerr
mdoerr at openjdk.java.net
Wed Nov 4 11:10:59 UTC 2020
On Wed, 4 Nov 2020 03:14:09 GMT, Ziviani <github.com+670087+jrziviani at openjdk.org> wrote:
>> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0.
>> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0.
>> Ref: PowerISA 3.1, page 129.
>>
>> These instructions are particularly interesting to improve the following
>> pattern `(src1<src2)? -1: ((src1>src2)? 1: 0)`, which can be found in
>> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches.
>>
>> Long.toString, that generate such pattern in getChars, has showed a
>> good performance gain by using these new instructions.
>>
>> Example:
>> for (int i = 0; i < 200_000; i++)
>> res = Long.toString((long)i);
>>
>> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString
>>
>> Without setbc (average): 0.1178 seconds
>> With setbc (average): 0.0396 seconds
>
> Ziviani has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision:
>
> 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions
>
> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0.
> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0.
> Ref: PowerISA 3.1, page 129.
>
> These instructions are particularly interesting to improve the following
> pattern `(src1<src2)? -1: ((src1>src2)? 1: 0)`, which can be found in
> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches.
>
> Long.toString, that generate such pattern in getChars, has showed a
> good performance gain by using these new instructions.
>
> Example:
> for (int i = 0; i < 200_000; i++)
> res = Long.toString((long)i);
>
> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString
>
> Without setbc (average): 0.1178 seconds
> With setbc (average): 0.0396 seconds
Nice work! Looks correct and much cleaner than before. I only have a few improvement requests left.
src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 240:
> 238:
> 239: // set dst to -1, 0, +1
> 240: inline void MacroAssembler::set_cmp3(Register dst) {
Please add assert_different_registers(dst, R0);
src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 241:
> 239: // set dst to -1, 0, +1
> 240: inline void MacroAssembler::set_cmp3(Register dst) {
> 241: // P10, prefer using setbc intructions
Please adapt style: 2 leading spaces in C++ code
src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 246:
> 244: setnbc(dst, CCR0, Assembler::less);
> 245: }
> 246: else {
Please adapt style: } else {
src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 255:
> 253:
> 254: // set dst to -1, 0, +1
> 255: inline void MacroAssembler::set_cmpu3(Register dst) {
Shorter possiblity with only 1 additional instruction on any Power version:
cror(CCR0, Assembler::less, CCR0, Assembler::summary_overflow); // treat overflow like less
set_cmp3(dst);
src/hotspot/cpu/ppc/ppc.ad line 11424:
> 11422: instruct cmpL3_reg_reg(iRegIdst dst, iRegLsrc src1, iRegLsrc src2) %{
> 11423: match(Set dst (CmpL3 src1 src2));
> 11424: ins_cost(DEFAULT_COST * (VM_Version::has_brw() ? 4 : 5));
"size" needs to be precise, but a rough estimate is sufficient for "ins_const". In this case CmpL3 has only one match rule, so matcher doesn't have a choice and cost is pointless. So I suggest to keep it more simple and make cost independent on has_brw.
-------------
Changes requested by mdoerr (Reviewer).
PR: https://git.openjdk.java.net/jdk/pull/907
More information about the hotspot-dev
mailing list