RFR: 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions [v4]
Martin Doerr
mdoerr at openjdk.java.net
Wed Nov 4 11:47:58 UTC 2020
On Wed, 4 Nov 2020 10:58:39 GMT, Martin Doerr <mdoerr at openjdk.org> wrote:
>> Ziviani has updated the pull request with a new target base due to a merge or a rebase. The incremental webrev excludes the unrelated changes brought in by the merge/rebase. The pull request contains one additional commit since the last revision:
>>
>> 8255553: [PPC64] Introduce and use setbc and setnbc P10 instructions
>>
>> - setbc RT,BI: sets RT to 1 if CR(BI) is 1, otherwise 0.
>> - setnbc RT,BI: sets RT to -1 if CR(BI) is 1, otherwise 0.
>> Ref: PowerISA 3.1, page 129.
>>
>> These instructions are particularly interesting to improve the following
>> pattern `(src1<src2)? -1: ((src1>src2)? 1: 0)`, which can be found in
>> `instruct cmpL3_reg_reg_ExEx()@ppc.ad`, by removing its branches.
>>
>> Long.toString, that generate such pattern in getChars, has showed a
>> good performance gain by using these new instructions.
>>
>> Example:
>> for (int i = 0; i < 200_000; i++)
>> res = Long.toString((long)i);
>>
>> java -Xcomp -XX:CompileThreshold=1 -XX:-TieredCompilation TestToString
>>
>> Without setbc (average): 0.1178 seconds
>> With setbc (average): 0.0396 seconds
>
> src/hotspot/cpu/ppc/macroAssembler_ppc.inline.hpp line 255:
>
>> 253:
>> 254: // set dst to -1, 0, +1
>> 255: inline void MacroAssembler::set_cmpu3(Register dst) {
>
> Shorter possiblity with only 1 additional instruction on any Power version:
> cror(CCR0, Assembler::less, CCR0, Assembler::summary_overflow); // treat unordered like less
> set_cmp3(dst);
Or even better with parameter:
inline void MacroAssembler::set_cmpu3(Register dst, bool treat_unordered_like_less) {
if (treat_unordered_like_less) {
cror(CCR0, Assembler::less, CCR0, Assembler::summary_overflow); // treat unordered like less
} else {
cror(CCR0, Assembler::greater, CCR0, Assembler::summary_overflow); // treat unordered like greater
}
set_cmp3(dst);
}
This allows more cleanup in interpreter and C1. (unordered_result is only +1 or -1 in TemplateTable::float_cmp which we can assert.)
-------------
PR: https://git.openjdk.java.net/jdk/pull/907
More information about the hotspot-dev
mailing list